Text Generation
Safetensors
qwen2
conversational

SP3F-7B

SP3F-7B is a multilingual model trained with Self-Play with Privileged Pairwise Feedback, we use Qwen2.5-7B as our base.

Model Overall MGSM MT Math100 Belebele Global MMLU Lite
Acc Lang Acc Lang Acc Lang Acc Lang Acc Lang
Qwen2.5-7B 14.79 78.78 22.15 90.67 21.16 58.22 7.52 80.39 8.34 85.85
    + SFT 21.70 82.11 33.66 91.37 26.72 58.26 12.94 89.18 13.48 89.62
        + RLVR 57.79 96.09 65.34 99.75 44.50 86.10 68.18 98.73 53.15 99.78
SP3F-7B 61.91 95.35 72.50 99.38 56.84 82.93 67.54 99.65 50.76 99.45
Qwen2.5-7B-Instruct 55.87 89.21 66.36 98.38 52.12 65.66 56.79 96.59 48.20 96.21
    + Translate Test 57.01 85.98 66.15 95.81 60.08 59.34 48.09 92.27 53.73 96.49

Citation

If you find this work helpful please use the following to cite our work.

@misc{sutawika2026gainedtranslationprivilegedpairwise,
      title={Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning}, 
      author={Lintang Sutawika and Gokul Swamy and Zhiwei Steven Wu and Graham Neubig},
      year={2026},
      eprint={2601.18722},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.18722}, 
}
Downloads last month
36
Safetensors
Model size
8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for neulab/SP3F-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(822)
this model
Quantizations
2 models

Dataset used to train neulab/SP3F-7B

Collection including neulab/SP3F-7B

Paper for neulab/SP3F-7B