Qwen3.5-9B Terminal Merge

A layer-wise optimized merge of 16 Qwen3.5-9B variants, tuned for strong terminal/CLI command generation performance.

Performance

Model Terminal Task Suite Tasks Passed
Qwen3.5-9B (base) 21.7% 13/60
This model 38.3% 23/60
Improvement +77% +10 tasks

Evaluated on a custom suite of 60 terminal tasks executed in sandboxed Docker containers. Tasks cover file operations, text processing, git workflows, networking, Python scripting, and system administration. Each task requires the model to produce working shell commands that are executed and verified against expected output.

Model Details

  • Architecture: Qwen3.5 (hybrid linear + full attention)
  • Parameters: 9B total
  • Context Length: 262,144 tokens
  • Precision: bfloat16
  • Layers: 32 (8 full attention + 24 linear attention)
  • Merge Method: Layer-wise linear merge with optimized per-layer weights

Source Models

This model combines optimized layer-wise weights from 16 Qwen3.5-9B variants spanning reasoning, instruction-following, and general capability specializations:

Category Models
Core Qwen/Qwen3.5-9B, unsloth/Qwen3.5-9B
Abliterated darkc0de/Qwen3.5-9B-heretic, lukey03/Qwen3.5-9B-abliterated, llmfan46/Qwen3.5-9B-ultimate-irrefusable-heretic, llmfan46/Qwen3.5-9B-ultra-heretic, jwest33/qwen3.5-9b-null-space-abliterated, trohrbaugh/Qwen3.5-9B-heretic-v2, osirisbrain/OsirisCortex-v6
Reasoning DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING, DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT, crownelius/Crow-9B-Opus-4.6-Distill-Heretic_Qwen3.5
Specialized alecccdd/Qwen3.5-9B-paraphrasing-orpo, lugman-madhiai/Qwen3.5-9B-MHS-Interleaved, Hastagaras/Qwen3.5-9B-GLM-Wannabe, zenlm/zen4

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "EganAI/qwen3.5-9b-terminal-merge"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="bfloat16",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Find all Python files larger than 1MB and sort by size descending"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

vLLM Serving

vllm serve EganAI/qwen3.5-9b-terminal-merge \
    --language-model-only \
    --dtype bfloat16 \
    --max-model-len 8192

Note: Use --language-model-only since this is a multimodal architecture served for text-only inference.

Training Details

The per-layer merge weights were optimized by evaluating candidates on a suite of 60 terminal tasks using vLLM inference in sandboxed Docker environments. The optimization searched across layer-group weight distributions to find the best blend of all 16 source models.

Limitations

  • Optimized specifically for terminal/CLI tasks; general-purpose performance may vary
  • Requires --language-model-only flag when serving with vLLM due to multimodal architecture
  • Visual capabilities are inherited from the base model but were not part of the optimization target
Downloads last month
719
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EganAI/qwen3.5-9b-terminal-merge