-
Scaling Agent Learning via Experience Synthesis
Paper • 2511.03773 • Published • 83 -
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Paper • 2511.21689 • Published • 126 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 230 -
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper • 2512.17102 • Published • 42
Yizhi
MercedeSnape
AI & ML interests
None yet
Recent Activity
updated a collection 8 days ago
agentic RL updated a collection 8 days ago
Benchmark updated a collection 8 days ago
Technical ReportOrganizations
None yet