Big regression
#3
by selimaktas - opened
That chart above actually indicates that Qwopus3.6-27B-v1-preview scores substantially higher at 0.222 than the previous Qwopus (3.5, v3) at 0.168.
But - I think I see what you mean:
| ... | Original | Qwopus | Change |
|---|---|---|---|
| Qwen3.5 Score | 0.1420 |
0.1680 |
+0.0260 |
| Qwen3.6 Score | 0.2920 |
0.2220 |
-0.0700 |
Well yeah? There is a HUGE gap between Qwen3.5 and Qwen3.6
What I'm trying to show here is that previous Qwopus increased Qwen3.5 performance but this one degrades instead
Tested it independently. It's quite a bit dumber (~20%) compared to other 3.6 quants in coding and tool use tests :\
Sadly I did not have the time to bench it with bfcl
