ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework Paper • 2603.20644 • Published 21 days ago • 4
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites Paper • 2510.12126 • Published Oct 14, 2025 • 2
ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework Paper • 2603.20644 • Published 21 days ago • 4
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 16 days ago • 126
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 16 days ago • 126
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Paper • 2603.09877 • Published Mar 10 • 48
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning Paper • 2510.11027 • Published Oct 13, 2025 • 23
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning Paper • 2510.11027 • Published Oct 13, 2025 • 23
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints Paper • 2510.08565 • Published Oct 9, 2025 • 21
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published Jun 12, 2024 • 32