InosLihka's picture
Add SFT v3 + GRPO refine results to README + results.md
666b4ce