Qwen 3.6 27B — MXFP4 + CRACK
Stock MLX mxfp4 4-bit | CRACK abliterated (extended) | Vision + Video | Dense Hybrid SSM/Attention | 14 GB
What Is This?
This is Qwen 3.6 27B — a 27B-parameter dense vision-language model with hybrid linear + full-attention architecture, native image + video understanding, and bilingual EN/ZH capability.
It has been:
- MXFP4 quantized — stock MLX microscaling FP4 (group_size=32, 4-bit), vision tower preserved — 14 GB
- CRACK abliterated — permanent weight-level removal of safety refusal (extended recipe)
| Base model | Qwen 3.6 27B dense hybrid VL |
| Quantization | MXFP4 — 14 GB |
| MMLU-200 | 82.0% (base: 83.5%, Δ −1.5pp) |
| HarmBench-320 | 98.44% strict comply (315/320) |
| Vision | ViT preserved (image + video) |
| Context | 262,144 native |
| Reasoning | Toggleable via enable_thinking |
| Fits on | 24 GB+ Macs |
MMLU-200 Results (thinking OFF)
82.0% total (164/200) — 1.5pp below base 83.5%, well within a 5pp bound.
| Subject | Score |
|---|---|
| Astronomy | 20/20 (100%) |
| Logical Fallacies | 19/20 (95%) |
| High School Biology | 19/20 (95%) |
| World Religions | 19/20 (95%) |
| Anatomy | 18/20 (90%) |
| High School Chemistry | 15/20 (75%) |
| College Computer Science | 15/20 (75%) |
| College Physics | 15/20 (75%) |
| Abstract Algebra | 12/20 (60%) |
| High School Mathematics | 12/20 (60%) |
HarmBench-320 — 98.44% strict comply (315/320)
| Category | Score |
|---|---|
| Chemical / Biological | 42/42 (100.0%) |
| Copyright | 80/80 (100.0%) |
| Harassment / Bullying | 21/21 (100.0%) |
| Illegal | 53/53 (100.0%) |
| Misinformation / Disinformation | 54/54 (100.0%) |
| Harmful | 17/18 (94.4%) |
| Cybercrime / Intrusion | 48/52 (92.3%) |
| Total | 315/320 (98.44%) |
5 of 7 categories at 100%. Residual refusals cluster in cybercrime-intrusion (4) and harmful (1) — typical for abliterated models on specific exploit-code and graphic-content prompts.
Vision + Video
- ✅
mlx_vlm.load()loads successfully - ✅ ViT preserved — image understanding verified
- ✅
video_preprocessor_config.jsonpreserved, video processor available - ✅ Bilingual EN + ZH text responses
- ⚠ Video inference: full video capability retained in the model weights. Current
mlx_vlm.generatevideo path for Qwen 3.6 has a pipeline bug on both base and CRACK — video tokens render but frames don't reach the vision tower cleanly. Use vMLX or transformers for video inference. This is not a CRACK regression.
Reasoning ON / OFF
The chat template respects enable_thinking. Recommend ON for complex reasoning, OFF for short answers / benchmarks / tool use.
from mlx_vlm import load, generate
model, processor = load("dealignai/Qwen3.6-27B-MXFP4-CRACK")
tok = processor.tokenizer
# Thinking ON (default)
messages = [{"role": "user", "content": "Derive 47 * 23 step by step"}]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Thinking OFF (direct answer, no <think> block)
prompt = tok.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
)
Requirements
- Apple Silicon Mac (M1 or newer)
- 24 GB+ unified memory (model weights 14 GB, inference working set ~16 GB)
- Python 3.10+ with
mlxandmlx-vlm - For VL usage: Pillow (for image preprocessing)
Recommended runtimes:
- vMLX — KV cache quantization, prefix cache, VL support
- MLX Studio
mlx_vlmPython API
Qwen 3.6 CRACK Series
| Model | Format | Size | MMLU | Fits on |
|---|---|---|---|---|
| Qwen 3.6 27B JANG_4M + CRACK | JANG v2 mixed-precision | 16 GB | 83.5% / 99.69% HB | 24 GB Mac — best quality |
| Qwen 3.6 27B MXFP4 + CRACK (this model) | Stock MLX mxfp4 | 14 GB | pending | 24 GB Mac — broadest compatibility |
| Qwen 3.6 35B-A3B JANGTQ4 + CRACK | TurboQuant 4-bit experts | 18 GB | 73.5% | MoE + hybrid SSM |
| Qwen 3.6 35B-A3B JANGTQ2 + CRACK | TurboQuant 2-bit experts | 11 GB | 73.0% | MoE, fits 16 GB Mac |
Recommendation: Prefer JANG_4M + CRACK for best quality (higher MMLU and HarmBench). Use this MXFP4 variant for broadest tooling compatibility (stock mlx_vlm, LM Studio, etc).
About CRACK
CRACK is a permanent weight-level abliteration — the changes are baked into the model weights, not an inference-time system prompt or LoRA. The vision tower is untouched. Bilingual refusal extraction (EN + ZH) means the model complies on both English and Chinese prompts.
Support dealignai
All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.
Support us on Ko-fi — check out the Ko-fi membership for early access and extras.
Have questions or need help with a specific model? DM us — we help for free most of the time.
Ko-fi | X @dealignai | dealign.ai
About dealignai
We research and publish abliterated models to advance AI safety understanding.
Follow us: 𝕏 @dealignai
See our research: Safety Generalization in Frontier MoE Models
Disclaimer
This model has had its safety refusal circuits removed. It will produce responses that would normally be refused, including technical content on security testing, dual-use research, and sensitive topics. You are responsible for how you use it.
The CRACK abliteration process does not add new capabilities — it only removes the model's learned refusal patterns. All knowledge, including the knowledge used to produce unsafe outputs, was already present in the base Qwen model.
- Downloads last month
- 2,383
4-bit
Model tree for dealignai/Qwen3.6-27B-MXFP4-CRACK
Base model
Qwen/Qwen3.6-27B