AI & Human Co-Improvement for Safer Co-Superintelligence Paper • 2512.05356 • Published Dec 5, 2025 • 10
AI & Human Co-Improvement for Safer Co-Superintelligence Paper • 2512.05356 • Published Dec 5, 2025 • 10 • 3
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30, 2025 • 117 • 5
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30, 2025 • 117 • 5
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28, 2025 • 18
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28, 2025 • 18 • 1
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published Oct 9, 2025 • 41
The Era of Real-World Human Interaction: RL from User Conversations Paper • 2509.25137 • Published Sep 29, 2025 • 19
The Majority is not always right: RL training for solution aggregation Paper • 2509.06870 • Published Sep 8, 2025 • 15
The Majority is not always right: RL training for solution aggregation Paper • 2509.06870 • Published Sep 8, 2025 • 15 • 2
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2, 2025 • 25
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2, 2025 • 25
StepWiser: Stepwise Generative Judges for Wiser Reasoning Paper • 2508.19229 • Published Aug 26, 2025 • 20
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning Paper • 2505.10320 • Published May 15, 2025 • 24
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks Paper • 2503.15478 • Published Mar 19, 2025 • 13
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper • 2501.10799 • Published Jan 18, 2025 • 15