AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery Paper • 2604.25256 • Published 17 days ago • 29
DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation Paper • 2604.14683 • Published 29 days ago • 36
OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence Paper • 2604.07296 • Published Apr 8 • 40
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web Paper • 2604.08516 • Published Apr 9 • 43
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models Paper • 2604.08545 • Published Apr 9 • 41
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering Paper • 2604.08224 • Published Apr 9 • 51
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation Paper • 2604.08455 • Published Apr 9 • 47