Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published 5 days ago • 72
Smoothie-Qwen: Post-Hoc Smoothing to Reduce Language Bias in Multilingual LLMs Paper • 2507.05686 • Published Jul 8, 2025 • 1