Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 3 days ago • 111
view post Post 1199 You can now do reinforcement learning training with 7× longer context and no accuracy loss, via our new batching algorithms.Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.Blog: https://unsloth.ai/docs/new/grpo-long-context See translation 🔥 6 6 ❤️ 3 3 🚀 3 3 + Reply
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 10 days ago • 120