Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 4 days ago • 217
AURA: Always-On Understanding and Real-Time Assistance via Video Streams Paper • 2604.04184 • Published 5 days ago • 43
BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation Paper • 2603.25732 • Published 14 days ago • 11
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Paper • 2603.15612 • Published 24 days ago • 152
FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach Paper • 2603.13364 • Published Mar 9 • 9
EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation Paper • 2603.12108 • Published 28 days ago • 8
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Paper • 2603.09877 • Published about 1 month ago • 47
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence Paper • 2603.07660 • Published Mar 8 • 85
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform Paper • 2512.08478 • Published Dec 9, 2025 • 77
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published Oct 9, 2025 • 110