Big regression

by selimaktas - opened 5 days ago

Hello again!
Unlike the previous Qwopus for Qwen3.5, this model has regressed SWE-Bench Verified performance. (Toolless, single-shot)
Hope to see better results with the longer-trained version!

GeoMaciolek

4 days ago

That chart above actually indicates that Qwopus3.6-27B-v1-preview scores substantially higher at 0.222 than the previous Qwopus (3.5, v3) at 0.168.
But - I think I see what you mean:

...	Original	Qwopus	Change
Qwen3.5 Score	`0.1420`	`0.1680`	`+0.0260`
Qwen3.6 Score	`0.2920`	`0.2220`	`-0.0700`

selimaktas

4 days ago

Well yeah? There is a HUGE gap between Qwen3.5 and Qwen3.6
What I'm trying to show here is that previous Qwopus increased Qwen3.5 performance but this one degrades instead

Thinkscape

2 days ago

Tested it independently. It's quite a bit dumber (~20%) compared to other 3.6 quants in coding and tool use tests :\

selimaktas

1 day ago

Sadly I did not have the time to bench it with bfcl

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment