ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Paper β’ 2601.11077 β’ Published 13 days ago β’ 63
OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding Paper β’ 2601.10343 β’ Published 14 days ago
Better Process Supervision with Bi-directional Rewarding Signals Paper β’ 2503.04618 β’ Published Mar 6, 2025
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning Paper β’ 2509.08755 β’ Published Sep 10, 2025 β’ 57
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper β’ 2510.18927 β’ Published Oct 21, 2025 β’ 84
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper β’ 2512.04987 β’ Published Dec 4, 2025 β’ 80
Pre-Trained Policy Discriminators are General Reward Models Paper β’ 2507.05197 β’ Published Jul 7, 2025 β’ 39
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset Paper β’ 2507.03483 β’ Published Jul 4, 2025 β’ 24
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting Paper β’ 2503.00784 β’ Published Mar 2, 2025 β’ 13
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning Paper β’ 2402.05808 β’ Published Feb 8, 2024
CritiQ: Mining Data Quality Criteria from Human Preferences Paper β’ 2502.19279 β’ Published Feb 26, 2025 β’ 10
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper β’ 2406.04151 β’ Published Jun 6, 2024 β’ 24
Code Needs Comments: Enhancing Code LLMs with Comment Augmentation Paper β’ 2402.13013 β’ Published Feb 20, 2024 β’ 1
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way Paper β’ 2312.00407 β’ Published Dec 1, 2023 β’ 3