Qwen3.5-9B Terminal Merge

A layer-wise optimized merge of 16 Qwen3.5-9B variants, tuned for strong terminal/CLI command generation performance.

Performance

Model	Terminal Task Suite	Tasks Passed
Qwen3.5-9B (base)	21.7%	13/60
This model	38.3%	23/60
Improvement	+77%	+10 tasks

Evaluated on a custom suite of 60 terminal tasks executed in sandboxed Docker containers. Tasks cover file operations, text processing, git workflows, networking, Python scripting, and system administration. Each task requires the model to produce working shell commands that are executed and verified against expected output.

Model Details

Architecture: Qwen3.5 (hybrid linear + full attention)
Parameters: 9B total
Context Length: 262,144 tokens
Precision: bfloat16
Layers: 32 (8 full attention + 24 linear attention)
Merge Method: Layer-wise linear merge with optimized per-layer weights

Source Models

This model combines optimized layer-wise weights from 16 Qwen3.5-9B variants spanning reasoning, instruction-following, and general capability specializations:

Category	Models
Core	Qwen/Qwen3.5-9B, unsloth/Qwen3.5-9B
Abliterated	darkc0de/Qwen3.5-9B-heretic, lukey03/Qwen3.5-9B-abliterated, llmfan46/Qwen3.5-9B-ultimate-irrefusable-heretic, llmfan46/Qwen3.5-9B-ultra-heretic, jwest33/qwen3.5-9b-null-space-abliterated, trohrbaugh/Qwen3.5-9B-heretic-v2, osirisbrain/OsirisCortex-v6
Reasoning	DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING, DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT, crownelius/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5
Specialized	alecccdd/Qwen3.5-9B-paraphrasing-orpo, lugman-madhiai/Qwen3.5-9B-MHS-Interleaved, Hastagaras/Qwen3.5-9B-GLM-Wannabe, zenlm/zen4

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "EganAI/qwen3.5-9b-terminal-merge"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="bfloat16",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Find all Python files larger than 1MB and sort by size descending"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM Serving

vllm serve EganAI/qwen3.5-9b-terminal-merge \
    --language-model-only \
    --dtype bfloat16 \
    --max-model-len 8192

Note: Use --language-model-only since this is a multimodal architecture served for text-only inference.

Training Details

The per-layer merge weights were optimized by evaluating candidates on a suite of 60 terminal tasks using vLLM inference in sandboxed Docker environments. The optimization searched across layer-group weight distributions to find the best blend of all 16 source models.

Limitations

Optimized specifically for terminal/CLI tasks; general-purpose performance may vary
Requires --language-model-only flag when serving with vLLM due to multimodal architecture
Visual capabilities are inherited from the base model but were not part of the optimization target