Qwen3.6-27B NVFP4 (GGUF, Vision-enabled)

Overview

This repository provides a GGUF-converted version of an NVFP4 checkpoint for local inference in LM Studio on Windows, optimized for NVIDIA RTX 5090 / Blackwell systems.

This release is a mixed-source packaging:

This is not an official upstream release. It is a community conversion intended for experimentation and local deployment.


Provenance

Component Source
Original base model Qwen/Qwen3.6-27B
NVFP4 checkpoint Abiray/Qwen3.6-27B-NVFP4
Vision projector (mmproj-BF16.gguf) unsloth/Qwen3.6-27B-GGUF
GGUF conversion tooling llama.cpp
NVFP4 conversion branch used pull/21095/head:pr-21095

Files in this Repository

Typical structure:

  • Qwen3.6-27B-NVFP4.gguf โ€” main model GGUF
  • mmproj-BF16.gguf โ€” vision projector, required for image input

Important

Vision functionality requires both files. If mmproj-BF16.gguf is missing, image input will not work in LM Studio.


Compatibility

Tested environment:

  • GPU: NVIDIA RTX 5090 / Blackwell
  • OS: Windows
  • Runtime: LM Studio >= 0.4.12
  • Conversion tooling: llama.cpp PR branch pr-21095

LM Studio added explicit Qwen 3.6 support starting from version 0.4.12.


Performance

Observed performance on RTX 5090:

  • Context window: up to 96,000 tokens
  • Generation speed: 50+ tokens/sec

Performance may vary depending on prompt length, context usage, GPU offload settings, driver version, LM Studio runtime version, and workload type.


Features

  • NVFP4-origin model converted into GGUF
  • Reasoning / thinking enabled by default
  • Vision-language support with mmproj-BF16.gguf
  • Intended for high-end local inference on Blackwell GPUs
  • Tested for LM Studio use on Windows

Reasoning / Thinking Behavior

Qwen3.6-27B enables reasoning by default and may emit reasoning content in <think>...</think> blocks.

Disabling reasoning is not done via /think or /nothink.

Where supported, use chat template configuration such as:

chat_template_kwargs = {"enable_thinking": False}

OR, start the server as:

.\build\bin\Release\llama-server.exe --% -m .\AxionML-Qwen3.5-9B-NVFP4-GGUF\axionml-qwen3.5-9b-nvfp4.gguf --host 127.0.0.1 --port 8080 -c 96000 -ngl 99 --chat-template-kwargs "{\"enable_thinking\":false}"

LM Studio may not expose a visible reasoning toggle for custom GGUF imports, depending on how the model metadata is recognized.


Vision / Image Input

The vision projector file was sourced from:

unsloth/Qwen3.6-27B-GGUF

For LM Studio vision support, place the main model GGUF and mmproj-BF16.gguf together in the imported model folder. Image input should become available when LM Studio recognizes the model as a vision-language model.


Conversion Notes

The main GGUF was converted from:

Abiray/Qwen3.6-27B-NVFP4

The conversion used a development branch of llama.cpp that includes work for compressed-tensors NVFP4 conversion:

ggml-org/llama.cpp PR #21095

Conversion Workflow

Sanitized workflow:

cd .\llama.cpp
git fetch origin pull/21095/head:pr-21095
git checkout pr-21095
python -m pip install --no-cache-dir -r requirements\requirements-convert_hf_to_gguf.txt

The upstream tokenizer configuration required a compatibility patch before conversion:

(Get-Content <model_dir>\tokenizer_config.json) `
-replace '"tokenizer_class": "TokenizersBackend"', '"tokenizer_class": "Qwen2Tokenizer"' `
| Set-Content <model_dir>\tokenizer_config.json

Then the model was converted using:

python convert_hf_to_gguf.py <model_dir> --outfile <output.gguf> --outtype bf16 --verbose

Important Details

  • The source checkpoint uses compressed-tensors NVFP4.
  • Conversion used the pr-21095 llama.cpp branch rather than ordinary master at the time of conversion.
  • The --outtype bf16 conversion path does not mean the entire release is a normal BF16 model.
  • NVFP4-compatible tensors are preserved or repacked where supported by the converter/runtime.
  • Required auxiliary tensors may remain in floating-point format.
  • The tokenizer compatibility patch changes only the tokenizer class metadata needed by the conversion stack.

Known Limitations

  • This release depends on bleeding-edge llama.cpp NVFP4 conversion support.
  • Multimodal + NVFP4 support is still evolving.
  • LM Studio may not expose all advanced model controls for custom GGUF imports.
  • Reasoning may be enabled by default without a visible UI toggle.
  • Mixing the main model and projector from different GGUF/NVFP4 sources may introduce subtle incompatibilities.
  • This is a community conversion and should be treated as experimental.

Usage with LM Studio

  1. Download/import the model folder in LM Studio.
  2. Ensure both files are present:
    • Qwen3.6-27B-NVFP4.gguf
    • mmproj-BF16.gguf
  3. Load the main GGUF model.
  4. Image input should become available if LM Studio recognizes the projector correctly.

License & Disclaimer

This repository is a derivative conversion based on upstream model assets.

Please review and follow the licenses and usage terms of:

Users are responsible for ensuring compliance with all upstream model terms.


Acknowledgements

  • Qwen for the original Qwen3.6-27B model
  • Abiray for the NVFP4 checkpoint
  • Unsloth for the Qwen3.6 GGUF release and vision projector
  • llama.cpp contributors for GGUF and NVFP4 tooling

Notes

This release is intended for:

  • Local experimentation
  • LM Studio testing
  • High-performance inference on RTX 5090 / Blackwell GPUs

It is not presented as an official or production-grade distribution.

Downloads last month
2,973
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Freenixi/Abiray-Qwen3.6-27B-NVFP4-GGUF

Base model

Qwen/Qwen3.6-27B
Quantized
(205)
this model