VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation Paper • 2505.14640 • Published May 20, 2025 • 16
VideoScore2: Think before You Score in Generative Video Evaluation Paper • 2509.22799 • Published Sep 26, 2025 • 26
CoSineVerifier: Tool-Augmented Answer Verification for Computation-Oriented Scientific Questions Paper • 2512.01224 • Published Dec 1, 2025
Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models Paper • 2512.06266 • Published Dec 6, 2025 • 8
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts Paper • 2602.13367 • Published 27 days ago • 31
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts Paper • 2602.13367 • Published 27 days ago • 31
Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation Paper • 2602.14469 • Published 24 days ago • 2
Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation Paper • 2602.14469 • Published 24 days ago • 2