view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 7 days ago • 57
view article Article Introducing Daggr: Chain apps programmatically, inspect visually +3 13 days ago • 96
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs Paper • 2601.17058 • Published 20 days ago • 187
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents Paper • 2601.16746 • Published 18 days ago • 89
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning Paper • 2601.09667 • Published 27 days ago • 87
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published about 1 month ago • 211
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published Dec 3, 2025 • 154
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 256
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 293
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper • 2511.14993 • Published Nov 19, 2025 • 231
HaluMem: Evaluating Hallucinations in Memory Systems of Agents Paper • 2511.03506 • Published Nov 5, 2025 • 94
AgentFold: Long-Horizon Web Agents with Proactive Context Management Paper • 2510.24699 • Published Oct 28, 2025 • 71
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21, 2025 • 113