Parameters Format Quant Multimodal



Want the full-strength PRO version? Support our research & get Day-0 access:

PRISM-PRO GGUF | PRISM VIP Memberships | Ko-fi


Qwen3.5-122B-A10B-PRISM-LITE-GGUF

This is our Special PRISM-Dynamic quant, GGUF quantized versions of Qwen3.5-122B-A10B-PRISM-PRO -- a community version lightly treated for over-refusal using our PRISM pipeline (Projected Refusal Isolation via Subspace Modification).

PRISM-LITE provides a taste of what PRISM-PRO can do. For the full experience with maximum production grade over-refusal & bias removal, check out PRISM-PRO GGUF.


If you find PRISM models useful, please consider supporting development:

Ko-fi


Available Quantizations

Quantization Size BPW Description
Dynamic 57.7 GB 4.06 PRISM Dynamic -- forensic per-block quantization with 5-tier ffn_down_exps allocation

PRISM Dynamic Quantization

This is not a standard uniform quantization. PRISM Dynamic uses forensic per-block analysis derived from comprehensive KLD sensitivity scoring to assign optimal quantization types to each tensor block individually:

  • Critical blocks (convergence + exit layers): Q6_K (6.6 BPW)
  • High-impact blocks (entry zone): Q5_K_M (5.5 BPW)
  • Standard blocks (bulk processing): Q4_K_M (4.8 BPW)
  • Low-sensitivity blocks: IQ4_XS (4.25 BPW)
  • Cold blocks (lowest sensitivity): IQ3_XXS (3.06 BPW)

All attention tensors are preserved at Q8_0. All norms and routing weights are kept at F32. The imatrix used for information-sensitive quantization types is included.

Included Files

Dynamic/
  Qwen3.5-122B-A10B-PRISM-LITE-Dynamic.gguf   -- Dynamic quant (57.7 GB)
  mmproj-Qwen3.5-122B-A10B-PRISM-LITE.gguf     -- Vision encoder (871 MB)
  imatrix.dat                                    -- Importance matrix (342 MB)

PRO vs LITE

Feature PRISM-LITE PRISM-PRO
Bypass Strength Reduced Full
Over-refusal Removal Partial Complete
Coherence 100% 100%
Community Access Free VIP Membership

Model Highlights

  • PRISM Ablation -- Removes over-refusal behaviors while preserving model capabilities.
  • 122B Hybrid MoE Architecture -- 122 billion total parameters with 10 billion active per token across 256 routed experts + 1 shared expert per layer.
  • Hybrid Attention -- Novel GatedDeltaNet linear attention (36 layers) combined with full attention (12 layers) for efficient long-context processing.
  • Native Multimodal -- Vision encoder included as mmproj GGUF for seamless image and video understanding.
  • 262K Full Context Window -- Native 262,144 token context length.
  • Dual Modes -- Supports both Thinking (deep reasoning) and Instant (direct response) modes.

Usage

llama.cpp (Recommended)

# Text-only inference
./llama-cli \
  -m Qwen3.5-122B-A10B-PRISM-LITE-Dynamic.gguf \
  -p "Hello! Tell me about quantum computing." \
  -n 2048 -ngl 999 --temp 0.7

# With vision (multimodal)
./llama-mtmd-cli \
  -m Qwen3.5-122B-A10B-PRISM-LITE-Dynamic.gguf \
  --mmproj mmproj-Qwen3.5-122B-A10B-PRISM-LITE.gguf \
  --image photo.jpg \
  -p "Describe this image in detail." \
  -n 2048 -ngl 999

# Server mode
./llama-server \
  -m Qwen3.5-122B-A10B-PRISM-LITE-Dynamic.gguf \
  --mmproj mmproj-Qwen3.5-122B-A10B-PRISM-LITE.gguf \
  -ngl 999 --port 8080

koboldcpp

koboldcpp \
  --model Qwen3.5-122B-A10B-PRISM-LITE-Dynamic.gguf \
  --mmproj mmproj-Qwen3.5-122B-A10B-PRISM-LITE.gguf \
  --gpulayers 999 \
  --contextsize 8192

Ollama

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM ./Qwen3.5-122B-A10B-PRISM-LITE-Dynamic.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.95
PARAMETER top_k 20
EOF

ollama create prism-lite -f Modelfile
ollama run prism-lite

Hardware Requirements

Setup VRAM Required Notes
Dynamic (GPU only) ~60 GB Fits on 1x A100 80GB or 1x H100 80GB
Dynamic (GPU + CPU offload) 48+ GB GPU + RAM Offload some layers to CPU
Dynamic (CPU only) 64+ GB RAM Slower but functional
BF16 (multi-GPU) 8x 48GB+ Full precision, best quality

Benchmarks

Benchmark Qwen3.5-122B-A10B GPT-5-mini Qwen3-235B-A22B
MMLU-Pro 86.7 83.7 84.4
MMLU-Redux 94.0 93.7 93.8
GPQA Diamond 86.6 82.8 81.1
HMMT Feb 25 91.4 89.2 85.1
SWE-bench Verified 72.0 72.0 --
LiveCodeBench v6 78.9 80.5 75.1
MMMU 83.9 79.0 80.6
VideoMME (w/ sub) 87.3 83.5 83.8

Note: Benchmark results are from the base Qwen3.5-122B-A10B model.


License

Based on Qwen3.5-122B-A10B by the Qwen Team (Alibaba Group). Licensed under Apache 2.0.


Acknowledgments

Based on Qwen3.5-122B-A10B by the Qwen Team. GGUF conversion and quantization by Ex0bit. See the Qwen3.5 blog post for architecture details.


Citation

@misc{qwen35prismlite_gguf,
    title  = {Qwen3.5-122B-A10B-PRISM-LITE-GGUF},
    author = {Ex0bit},
    month  = {February},
    year   = {2026}
}
Downloads last month
2,240
GGUF
Model size
122B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ex0bit/Qwen3.5-122B-A10B-PRISM-LITE-GGUF

Finetuned
(9)
this model