Huihui Qwen3.6-35B A3B Abliterated (GGUF)

This repository provides GGUF format quantizations for the huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated model.

Because this model has been fully "abliterated" to bypass alignment and safety refusals, it acts as a highly capable engine for unrestricted creative writing, dynamic storytelling, and immersive roleplay scenarios.

Available Quantizations

File Bit Size Description
huihui-35B-Q8_0.gguf 8-bit Highest quality quant, virtually indistinguishable from F16.
huihui-35B-Q6_K.gguf 6-bit Excellent quality with a noticeably reduced memory footprint.
huihui-35B-Q5_K_M.gguf 5-bit Great balance between reasoning performance and RAM usage.
huihui-35B-Q4_K_M.gguf 4-bit Recommended. The optimal sweet spot for speed and quality.
huihui-35B-Q4_K_S.gguf 4-bit Slightly smaller than K_M, allowing for faster inference on constrained setups.
huihui-35B-Q3_K_M.gguf 3-bit Lowest resource requirement, though perplexity loss becomes more noticeable.

Quick Start (llama.cpp)

These models are designed to be run directly via llama.cpp. The following commands are standard for local Linux environments (such as Linux Mint or Ubuntu).

1. Clone and compile via CMake:

git clone [https://github.com/ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)
cd llama.cpp
cmake -B build
cmake --build build --config Release
Downloads last month
8,931
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abiray/Huihui-Qwen3.6-35B-A3B-abliterated-GGUF

Quantized
(11)
this model

Collection including Abiray/Huihui-Qwen3.6-35B-A3B-abliterated-GGUF