Qwen/Qwen3-Coder-480B-A35B-Instruct Text Generation • 480B • Updated Aug 21, 2025 • 37.6k • • 1.33k
meta-llama/Llama-3.2-1B-Instruct Text Generation • 1B • Updated Oct 24, 2024 • 6.9M • • 1.4k
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs Paper • 2511.07419 • Published Nov 10, 2025 • 27