Qwen3.6-35B-A3B
Collection
10 items • Updated
This repository provides GGUF format quantizations for the huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated model.
Because this model has been fully "abliterated" to bypass alignment and safety refusals, it acts as a highly capable engine for unrestricted creative writing, dynamic storytelling, and immersive roleplay scenarios.
| File | Bit Size | Description |
|---|---|---|
huihui-35B-Q8_0.gguf |
8-bit | Highest quality quant, virtually indistinguishable from F16. |
huihui-35B-Q6_K.gguf |
6-bit | Excellent quality with a noticeably reduced memory footprint. |
huihui-35B-Q5_K_M.gguf |
5-bit | Great balance between reasoning performance and RAM usage. |
huihui-35B-Q4_K_M.gguf |
4-bit | Recommended. The optimal sweet spot for speed and quality. |
huihui-35B-Q4_K_S.gguf |
4-bit | Slightly smaller than K_M, allowing for faster inference on constrained setups. |
huihui-35B-Q3_K_M.gguf |
3-bit | Lowest resource requirement, though perplexity loss becomes more noticeable. |
These models are designed to be run directly via llama.cpp. The following commands are standard for local Linux environments (such as Linux Mint or Ubuntu).
1. Clone and compile via CMake:
git clone [https://github.com/ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)
cd llama.cpp
cmake -B build
cmake --build build --config Release
3-bit
4-bit
5-bit
6-bit
8-bit
Base model
Qwen/Qwen3.6-35B-A3B