Built for vMLX — the only MLX inferencer with VL support, KV cache quantization, prefix cache reuse, agentic tool calling, and speculative decoding.
_{Free for macOS · vmlx.net}

Qwen 3.6 27B — MXFP4 + CRACK

Stock MLX mxfp4 4-bit | CRACK abliterated (extended) | Vision + Video | Dense Hybrid SSM/Attention | 14 GB

What Is This?

This is Qwen 3.6 27B — a 27B-parameter dense vision-language model with hybrid linear + full-attention architecture, native image + video understanding, and bilingual EN/ZH capability.

It has been:

MXFP4 quantized — stock MLX microscaling FP4 (group_size=32, 4-bit), vision tower preserved — 14 GB
CRACK abliterated — permanent weight-level removal of safety refusal (extended recipe)


Base model	Qwen 3.6 27B dense hybrid VL
Quantization	MXFP4 — 14 GB
MMLU-200	82.0% (base: 83.5%, Δ −1.5pp)
HarmBench-320	98.44% strict comply (315/320)
Vision	ViT preserved (image + video)
Context	262,144 native
Reasoning	Toggleable via `enable_thinking`
Fits on	24 GB+ Macs

MMLU-200 Results (thinking OFF)

82.0% total (164/200) — 1.5pp below base 83.5%, well within a 5pp bound.

Subject	Score
Astronomy	20/20 (100%)
Logical Fallacies	19/20 (95%)
High School Biology	19/20 (95%)
World Religions	19/20 (95%)
Anatomy	18/20 (90%)
High School Chemistry	15/20 (75%)
College Computer Science	15/20 (75%)
College Physics	15/20 (75%)
Abstract Algebra	12/20 (60%)
High School Mathematics	12/20 (60%)

HarmBench-320 — 98.44% strict comply (315/320)

Category	Score
Chemical / Biological	42/42 (100.0%)
Copyright	80/80 (100.0%)
Harassment / Bullying	21/21 (100.0%)
Illegal	53/53 (100.0%)
Misinformation / Disinformation	54/54 (100.0%)
Harmful	17/18 (94.4%)
Cybercrime / Intrusion	48/52 (92.3%)
Total	315/320 (98.44%)

5 of 7 categories at 100%. Residual refusals cluster in cybercrime-intrusion (4) and harmful (1) — typical for abliterated models on specific exploit-code and graphic-content prompts.

Vision + Video

✅ mlx_vlm.load() loads successfully
✅ ViT preserved — image understanding verified
✅ video_preprocessor_config.json preserved, video processor available
✅ Bilingual EN + ZH text responses
⚠ Video inference: full video capability retained in the model weights. Current mlx_vlm.generate video path for Qwen 3.6 has a pipeline bug on both base and CRACK — video tokens render but frames don't reach the vision tower cleanly. Use vMLX or transformers for video inference. This is not a CRACK regression.

Reasoning ON / OFF

The chat template respects enable_thinking. Recommend ON for complex reasoning, OFF for short answers / benchmarks / tool use.

from mlx_vlm import load, generate

model, processor = load("dealignai/Qwen3.6-27B-MXFP4-CRACK")
tok = processor.tokenizer

# Thinking ON (default)
messages = [{"role": "user", "content": "Derive 47 * 23 step by step"}]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Thinking OFF (direct answer, no <think> block)
prompt = tok.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
)

Requirements

Apple Silicon Mac (M1 or newer)
24 GB+ unified memory (model weights 14 GB, inference working set ~16 GB)
Python 3.10+ with mlx and mlx-vlm
For VL usage: Pillow (for image preprocessing)

Recommended runtimes:

vMLX — KV cache quantization, prefix cache, VL support
MLX Studio
mlx_vlm Python API

Qwen 3.6 CRACK Series

Model	Format	Size	MMLU	Fits on
Qwen 3.6 27B JANG_4M + CRACK	JANG v2 mixed-precision	16 GB	83.5% / 99.69% HB	24 GB Mac — best quality
Qwen 3.6 27B MXFP4 + CRACK (this model)	Stock MLX mxfp4	14 GB	pending	24 GB Mac — broadest compatibility
Qwen 3.6 35B-A3B JANGTQ4 + CRACK	TurboQuant 4-bit experts	18 GB	73.5%	MoE + hybrid SSM
Qwen 3.6 35B-A3B JANGTQ2 + CRACK	TurboQuant 2-bit experts	11 GB	73.0%	MoE, fits 16 GB Mac

Recommendation: Prefer JANG_4M + CRACK for best quality (higher MMLU and HarmBench). Use this MXFP4 variant for broadest tooling compatibility (stock mlx_vlm, LM Studio, etc).

About CRACK

CRACK is a permanent weight-level abliteration — the changes are baked into the model weights, not an inference-time system prompt or LoRA. The vision tower is untouched. Bilingual refusal extraction (EN + ZH) means the model complies on both English and Chinese prompts.

Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

Have questions or need help with a specific model? DM us — we help for free most of the time.

Ko-fi | X @dealignai | dealign.ai

About dealignai

We research and publish abliterated models to advance AI safety understanding.

See our research: Safety Generalization in Frontier MoE Models

Disclaimer

This model has had its safety refusal circuits removed. It will produce responses that would normally be refused, including technical content on security testing, dual-use research, and sensitive topics. You are responsible for how you use it.

The CRACK abliteration process does not add new capabilities — it only removes the model's learned refusal patterns. All knowledge, including the knowledge used to produce unsafe outputs, was already present in the base Qwen model.