MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published Sep 4, 2024 • 32
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI Paper • 2404.16006 • Published Apr 24, 2024 • 1
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 7 items • Updated Oct 30, 2025 • 78
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 7 items • Updated 12 days ago • 161
Chameleon Collection Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR. • 2 items • Updated Jul 9, 2024 • 34
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 50 items • Updated Dec 11, 2025 • 137
FLUX.1 Collection A collection of our FLUX.1 models and LoRAs. • 13 items • Updated 10 days ago • 261
Cosmos Collection ⚠️ This collection is archived. 👉 https://huggingface.co/collections/nvidia/nvidia-cosmos-2 • 31 items • Updated about 4 hours ago • 299
ZeroGPU Spaces Collection ZeroGPU Spaces made by the community • 17 items • Updated Jun 6, 2024 • 245