Qwen3.6-27B NVFP4 (GGUF, Vision-enabled)

Overview

This repository provides a GGUF-converted version of an NVFP4 checkpoint for local inference in LM Studio on Windows, optimized for NVIDIA RTX 5090 / Blackwell systems.

This release is a mixed-source packaging:

Base model: Qwen/Qwen3.6-27B
NVFP4 checkpoint source: Abiray/Qwen3.6-27B-NVFP4
Vision projector source: unsloth/Qwen3.6-27B-GGUF

This is not an official upstream release. It is a community conversion intended for experimentation and local deployment.

Provenance

Component	Source
Original base model	Qwen/Qwen3.6-27B
NVFP4 checkpoint	Abiray/Qwen3.6-27B-NVFP4
Vision projector (`mmproj-BF16.gguf`)	unsloth/Qwen3.6-27B-GGUF
GGUF conversion tooling	llama.cpp
NVFP4 conversion branch used	`pull/21095/head:pr-21095`

Files in this Repository

Typical structure:

Qwen3.6-27B-NVFP4.gguf — main model GGUF
mmproj-BF16.gguf — vision projector, required for image input

Important

Vision functionality requires both files. If mmproj-BF16.gguf is missing, image input will not work in LM Studio.

Compatibility

Tested environment:

GPU: NVIDIA RTX 5090 / Blackwell
OS: Windows
Runtime: LM Studio >= 0.4.12
Conversion tooling: llama.cpp PR branch pr-21095

LM Studio added explicit Qwen 3.6 support starting from version 0.4.12.

Performance

Observed performance on RTX 5090:

Context window: up to 96,000 tokens
Generation speed: 50+ tokens/sec

Performance may vary depending on prompt length, context usage, GPU offload settings, driver version, LM Studio runtime version, and workload type.

Features

NVFP4-origin model converted into GGUF
Reasoning / thinking enabled by default
Vision-language support with mmproj-BF16.gguf
Intended for high-end local inference on Blackwell GPUs
Tested for LM Studio use on Windows

Reasoning / Thinking Behavior

Qwen3.6-27B enables reasoning by default and may emit reasoning content in <think>...</think> blocks.

Disabling reasoning is not done via /think or /nothink.

Where supported, use chat template configuration such as:

chat_template_kwargs = {"enable_thinking": False}

OR, start the server as:

.\build\bin\Release\llama-server.exe --% -m .\AxionML-Qwen3.5-9B-NVFP4-GGUF\axionml-qwen3.5-9b-nvfp4.gguf --host 127.0.0.1 --port 8080 -c 96000 -ngl 99 --chat-template-kwargs "{\"enable_thinking\":false}"

LM Studio may not expose a visible reasoning toggle for custom GGUF imports, depending on how the model metadata is recognized.

Vision / Image Input

The vision projector file was sourced from:

unsloth/Qwen3.6-27B-GGUF

For LM Studio vision support, place the main model GGUF and mmproj-BF16.gguf together in the imported model folder. Image input should become available when LM Studio recognizes the model as a vision-language model.

Conversion Notes

The main GGUF was converted from:

Abiray/Qwen3.6-27B-NVFP4

The conversion used a development branch of llama.cpp that includes work for compressed-tensors NVFP4 conversion:

ggml-org/llama.cpp PR #21095

Conversion Workflow

Sanitized workflow:

cd .\llama.cpp
git fetch origin pull/21095/head:pr-21095
git checkout pr-21095
python -m pip install --no-cache-dir -r requirements\requirements-convert_hf_to_gguf.txt

The upstream tokenizer configuration required a compatibility patch before conversion:

(Get-Content <model_dir>\tokenizer_config.json) `
-replace '"tokenizer_class": "TokenizersBackend"', '"tokenizer_class": "Qwen2Tokenizer"' `
| Set-Content <model_dir>\tokenizer_config.json

Then the model was converted using:

python convert_hf_to_gguf.py <model_dir> --outfile <output.gguf> --outtype bf16 --verbose

Important Details

The source checkpoint uses compressed-tensors NVFP4.
Conversion used the pr-21095 llama.cpp branch rather than ordinary master at the time of conversion.
The --outtype bf16 conversion path does not mean the entire release is a normal BF16 model.
NVFP4-compatible tensors are preserved or repacked where supported by the converter/runtime.
Required auxiliary tensors may remain in floating-point format.
The tokenizer compatibility patch changes only the tokenizer class metadata needed by the conversion stack.

Known Limitations

This release depends on bleeding-edge llama.cpp NVFP4 conversion support.
Multimodal + NVFP4 support is still evolving.
LM Studio may not expose all advanced model controls for custom GGUF imports.
Reasoning may be enabled by default without a visible UI toggle.
Mixing the main model and projector from different GGUF/NVFP4 sources may introduce subtle incompatibilities.
This is a community conversion and should be treated as experimental.

Usage with LM Studio

Download/import the model folder in LM Studio.
Ensure both files are present:
- Qwen3.6-27B-NVFP4.gguf
- mmproj-BF16.gguf
Load the main GGUF model.
Image input should become available if LM Studio recognizes the projector correctly.

License & Disclaimer

This repository is a derivative conversion based on upstream model assets.

Please review and follow the licenses and usage terms of:

Users are responsible for ensuring compliance with all upstream model terms.

Acknowledgements

Qwen for the original Qwen3.6-27B model
Abiray for the NVFP4 checkpoint
Unsloth for the Qwen3.6 GGUF release and vision projector
llama.cpp contributors for GGUF and NVFP4 tooling

Notes

This release is intended for:

Local experimentation
LM Studio testing
High-performance inference on RTX 5090 / Blackwell GPUs

It is not presented as an official or production-grade distribution.

Downloads last month: 2,973

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

4-bit

Model tree for Freenixi/Abiray-Qwen3.6-27B-NVFP4-GGUF

Base model

Qwen/Qwen3.6-27B

Quantized

(205)

this model