JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper • 2512.22905 • Published Dec 28, 2025 • 20
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks Paper • 2510.25760 • Published Oct 29, 2025 • 17
Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods Paper • 2510.07143 • Published Oct 8, 2025 • 13
PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era Paper • 2509.12989 • Published Sep 16, 2025 • 28