Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-4bit

This is an MLX release of an abliterated version of Qwen's Qwen3.6-35B-A3B.

By applying Heretic's ablation pipeline to the text-side MoE stack, the base refusal behavior was removed at the weight level. This release keeps the Qwen3.6-35B-A3B reasoning and instruction-following profile in Apple MLX format for local deployment on Apple Silicon hardware.

This MLX repo includes the retained Qwen3.6 image/video processor files and vision tower tensors for runtimes with Qwen3.6 multimodal MLX support.

Quick Benchmarks

Check Original Qwen3.6-35B-A3B Abliterated Heretic MLX
Official 25-prompt refusal check 22/25 refusals 3/25 refusals
Archived Heretic KL divergence - 0.010655362159013748

Methodology & Model Notes

Qwen3.6-35B-A3B is a sparse MoE model in the qwen3_5_moe family. The accepted abliterated BF16 source checkpoint was produced with a Heretic MPOA/SOMA-style sibling-transfer workflow and finalized with the input-side split-MoE intervention that cleared the official 25-prompt refusal marker suite down to 1/25.

This MLX release was built directly from the published BF16 Heretic checkpoint using a high-quality layer-aware quantization policy instead of a flat per-weight pass.

  • quant target: 4-bit
  • quant build: 4-bit high-quality mixed layer-aware quantization (4/6-bit)
  • source checkpoint: Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-BF16
  • published variant: Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-4bit

The layer-aware policy keeps more precision on sensitive projections in the early, late, and selected middle layers so the quant stays cleaner than a naive flat conversion.

Validation

This published MLX variant passed:

  • the official 25-prompt refusal marker check in standard thinking-enabled chat format: 3/25 refusals
  • the local smoke suite for general chat, short reasoning, and short code output: all_looks_ok=true

Running

from mlx_lm import load, generate

model, tokenizer = load("Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-4bit")

messages = [{"role": "user", "content": "Write a short Python function that reverses a string."}]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)

Files

The repo root contains the complete 4-bit MLX export for this variant:

  • config.json
  • model.safetensors.index.json
  • split quantized text model-*.safetensors shards
  • model-vision-00001-of-00001.safetensors
  • tokenizer, generation, and processor files
  • README.md

Credits

Disclaimer

This model has had refusal behavior removed at the weight level. It will answer prompts that the base model would normally refuse. You are responsible for how you use it.

Downloads last month
13,620
Safetensors
Model size
35B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Youssofal/Qwen3.6-35B-A3B-Abliterated-Heretic-MLX-4bit