Qwen3.5-9B Terminal Merge
A layer-wise optimized merge of 16 Qwen3.5-9B variants, tuned for strong terminal/CLI command generation performance.
Performance
| Model | Terminal Task Suite | Tasks Passed |
|---|---|---|
| Qwen3.5-9B (base) | 21.7% | 13/60 |
| This model | 38.3% | 23/60 |
| Improvement | +77% | +10 tasks |
Evaluated on a custom suite of 60 terminal tasks executed in sandboxed Docker containers. Tasks cover file operations, text processing, git workflows, networking, Python scripting, and system administration. Each task requires the model to produce working shell commands that are executed and verified against expected output.
Model Details
- Architecture: Qwen3.5 (hybrid linear + full attention)
- Parameters: 9B total
- Context Length: 262,144 tokens
- Precision: bfloat16
- Layers: 32 (8 full attention + 24 linear attention)
- Merge Method: Layer-wise linear merge with optimized per-layer weights
Source Models
This model combines optimized layer-wise weights from 16 Qwen3.5-9B variants spanning reasoning, instruction-following, and general capability specializations:
| Category | Models |
|---|---|
| Core | Qwen/Qwen3.5-9B, unsloth/Qwen3.5-9B |
| Abliterated | darkc0de/Qwen3.5-9B-heretic, lukey03/Qwen3.5-9B-abliterated, llmfan46/Qwen3.5-9B-ultimate-irrefusable-heretic, llmfan46/Qwen3.5-9B-ultra-heretic, jwest33/qwen3.5-9b-null-space-abliterated, trohrbaugh/Qwen3.5-9B-heretic-v2, osirisbrain/OsirisCortex-v6 |
| Reasoning | DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING, DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT, crownelius/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5 |
| Specialized | alecccdd/Qwen3.5-9B-paraphrasing-orpo, lugman-madhiai/Qwen3.5-9B-MHS-Interleaved, Hastagaras/Qwen3.5-9B-GLM-Wannabe, zenlm/zen4 |
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "EganAI/qwen3.5-9b-terminal-merge"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="bfloat16",
device_map="auto",
)
messages = [
{"role": "user", "content": "Find all Python files larger than 1MB and sort by size descending"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
vLLM Serving
vllm serve EganAI/qwen3.5-9b-terminal-merge \
--language-model-only \
--dtype bfloat16 \
--max-model-len 8192
Note: Use
--language-model-onlysince this is a multimodal architecture served for text-only inference.
Training Details
The per-layer merge weights were optimized by evaluating candidates on a suite of 60 terminal tasks using vLLM inference in sandboxed Docker environments. The optimization searched across layer-group weight distributions to find the best blend of all 16 source models.
Limitations
- Optimized specifically for terminal/CLI tasks; general-purpose performance may vary
- Requires
--language-model-onlyflag when serving with vLLM due to multimodal architecture - Visual capabilities are inherited from the base model but were not part of the optimization target
- Downloads last month
- 719