Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors Paper • 2601.15625 • Published 11 days ago • 3
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing Paper • 2601.21957 • Published 4 days ago • 5
Deep Search with Hierarchical Meta-Cognitive Monitoring Inspired by Cognitive Neuroscience Paper • 2601.23188 • Published 3 days ago • 3
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Paper • 2601.21558 • Published 4 days ago • 39
Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives Paper • 2601.20833 • Published 5 days ago • 160
Self-Improving Pretraining: using post-trained models to pretrain better models Paper • 2601.21343 • Published 4 days ago • 11
Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts Paper • 2601.22156 • Published 4 days ago • 7
AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts Paper • 2601.20730 • Published 5 days ago • 18
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods Paper • 2601.21821 • Published 4 days ago • 49
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published 4 days ago • 95
FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning Paper • 2601.18150 • Published 7 days ago • 5
OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution Paper • 2601.20380 • Published 5 days ago • 8
Linear representations in language models can change dramatically over a conversation Paper • 2601.20834 • Published 5 days ago • 20
Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning Paper • 2601.20209 • Published 5 days ago • 22
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery Paper • 2601.19325 • Published 6 days ago • 75