Big regression

#3
by selimaktas - opened

Hello again!
Unlike the previous Qwopus for Qwen3.5, this model has regressed SWE-Bench Verified performance. (Toolless, single-shot)
Hope to see better results with the longer-trained version!
image

That chart above actually indicates that Qwopus3.6-27B-v1-preview scores substantially higher at 0.222 than the previous Qwopus (3.5, v3) at 0.168.
But - I think I see what you mean:

... Original Qwopus Change
Qwen3.5 Score 0.1420 0.1680 +0.0260
Qwen3.6 Score 0.2920 0.2220 -0.0700

Well yeah? There is a HUGE gap between Qwen3.5 and Qwen3.6
What I'm trying to show here is that previous Qwopus increased Qwen3.5 performance but this one degrades instead

Tested it independently. It's quite a bit dumber (~20%) compared to other 3.6 quants in coding and tool use tests :\

Sadly I did not have the time to bench it with bfcl

Sign up or log in to comment