On Data Engineering for Scaling LLM Terminal Capabilities Paper β’ 2602.21193 β’ Published about 1 month ago β’ 101
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper β’ 2602.17684 β’ Published Feb 4 β’ 22
Rethinking the Trust Region in LLM Reinforcement Learning Paper β’ 2602.04879 β’ Published Feb 4 β’ 37
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition Paper β’ 2307.13269 β’ Published Jul 25, 2023 β’ 34
Diffusion Language Models are Super Data Learners Paper β’ 2511.03276 β’ Published Nov 5, 2025 β’ 132
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper β’ 2509.22638 β’ Published Sep 26, 2025 β’ 70
cwm Collection Collection for Code World Model, an agentic coding model from FAIR. β’ 3 items β’ Updated Sep 24, 2025 β’ 18
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper β’ 2506.20512 β’ Published Jun 25, 2025 β’ 47
Reinforcing General Reasoning without Verifiers Paper β’ 2505.21493 β’ Published May 27, 2025 β’ 26
Fostering Video Reasoning via Next-Event Prediction Paper β’ 2505.22457 β’ Published May 28, 2025 β’ 29