Rethinking Cross-Layer Information Routing in Diffusion Transformers Paper • 2605.20708 • Published 6 days ago • 75 • 3
Let ViT Speak: Generative Language-Image Pre-training Paper • 2605.00809 • Published 25 days ago • 32
view article Article Introducing NVIDIA Cosmos Policy for Advanced Robot Control nvidia • Jan 29 • 48
view article Article NEO-unify: Building Native Multimodal Unified Models End to End sensenova • Mar 5 • 163
google/siglip2-giant-opt-patch16-256 Zero-Shot Image Classification • 2B • Updated Feb 21, 2025 • 5.88k • 4
facebook/dinov3-vitl16-pretrain-lvd1689m Image Feature Extraction • 0.3B • Updated Aug 19, 2025 • 723k • 295
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models Paper • 2601.19834 • Published Jan 27 • 25
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published Dec 30, 2025 • 52