Huihui Qwen3.6-35B A3B Abliterated (GGUF)

This repository provides GGUF format quantizations for the huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated model.

Because this model has been fully "abliterated" to bypass alignment and safety refusals, it acts as a highly capable engine for unrestricted creative writing, dynamic storytelling, and immersive roleplay scenarios.

Available Quantizations

File	Bit Size	Description
`huihui-35B-Q8_0.gguf`	8-bit	Highest quality quant, virtually indistinguishable from F16.
`huihui-35B-Q6_K.gguf`	6-bit	Excellent quality with a noticeably reduced memory footprint.
`huihui-35B-Q5_K_M.gguf`	5-bit	Great balance between reasoning performance and RAM usage.
`huihui-35B-Q4_K_M.gguf`	4-bit	Recommended. The optimal sweet spot for speed and quality.
`huihui-35B-Q4_K_S.gguf`	4-bit	Slightly smaller than K_M, allowing for faster inference on constrained setups.
`huihui-35B-Q3_K_M.gguf`	3-bit	Lowest resource requirement, though perplexity loss becomes more noticeable.

Quick Start (llama.cpp)

These models are designed to be run directly via llama.cpp. The following commands are standard for local Linux environments (such as Linux Mint or Ubuntu).

1. Clone and compile via CMake:

git clone [https://github.com/ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)
cd llama.cpp
cmake -B build
cmake --build build --config Release

Downloads last month: 8,931

GGUF

Model size

35B params

Architecture

qwen35moe

Hardware compatibility

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abiray/Huihui-Qwen3.6-35B-A3B-abliterated-GGUF

Base model

Qwen/Qwen3.6-35B-A3B

Finetuned

huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated

Quantized

(11)

this model

Collection including Abiray/Huihui-Qwen3.6-35B-A3B-abliterated-GGUF

Qwen3.6-35B-A3B

Collection

10 items • Updated 11 days ago