Qwen3.6-27B NVFP4 (GGUF, Vision-enabled)
Overview
This repository provides a GGUF-converted version of an NVFP4 checkpoint for local inference in LM Studio on Windows, optimized for NVIDIA RTX 5090 / Blackwell systems.
This release is a mixed-source packaging:
- Base model: Qwen/Qwen3.6-27B
- NVFP4 checkpoint source: Abiray/Qwen3.6-27B-NVFP4
- Vision projector source: unsloth/Qwen3.6-27B-GGUF
This is not an official upstream release. It is a community conversion intended for experimentation and local deployment.
Provenance
| Component | Source |
|---|---|
| Original base model | Qwen/Qwen3.6-27B |
| NVFP4 checkpoint | Abiray/Qwen3.6-27B-NVFP4 |
Vision projector (mmproj-BF16.gguf) |
unsloth/Qwen3.6-27B-GGUF |
| GGUF conversion tooling | llama.cpp |
| NVFP4 conversion branch used | pull/21095/head:pr-21095 |
Files in this Repository
Typical structure:
Qwen3.6-27B-NVFP4.ggufโ main model GGUFmmproj-BF16.ggufโ vision projector, required for image input
Important
Vision functionality requires both files. If mmproj-BF16.gguf is missing, image input will not work in LM Studio.
Compatibility
Tested environment:
- GPU: NVIDIA RTX 5090 / Blackwell
- OS: Windows
- Runtime: LM Studio >= 0.4.12
- Conversion tooling:
llama.cppPR branchpr-21095
LM Studio added explicit Qwen 3.6 support starting from version 0.4.12.
Performance
Observed performance on RTX 5090:
- Context window: up to 96,000 tokens
- Generation speed: 50+ tokens/sec
Performance may vary depending on prompt length, context usage, GPU offload settings, driver version, LM Studio runtime version, and workload type.
Features
- NVFP4-origin model converted into GGUF
- Reasoning / thinking enabled by default
- Vision-language support with
mmproj-BF16.gguf - Intended for high-end local inference on Blackwell GPUs
- Tested for LM Studio use on Windows
Reasoning / Thinking Behavior
Qwen3.6-27B enables reasoning by default and may emit reasoning content in <think>...</think> blocks.
Disabling reasoning is not done via /think or /nothink.
Where supported, use chat template configuration such as:
chat_template_kwargs = {"enable_thinking": False}
OR, start the server as:
.\build\bin\Release\llama-server.exe --% -m .\AxionML-Qwen3.5-9B-NVFP4-GGUF\axionml-qwen3.5-9b-nvfp4.gguf --host 127.0.0.1 --port 8080 -c 96000 -ngl 99 --chat-template-kwargs "{\"enable_thinking\":false}"
LM Studio may not expose a visible reasoning toggle for custom GGUF imports, depending on how the model metadata is recognized.
Vision / Image Input
The vision projector file was sourced from:
For LM Studio vision support, place the main model GGUF and mmproj-BF16.gguf together in the imported model folder. Image input should become available when LM Studio recognizes the model as a vision-language model.
Conversion Notes
The main GGUF was converted from:
The conversion used a development branch of llama.cpp that includes work for compressed-tensors NVFP4 conversion:
Conversion Workflow
Sanitized workflow:
cd .\llama.cpp
git fetch origin pull/21095/head:pr-21095
git checkout pr-21095
python -m pip install --no-cache-dir -r requirements\requirements-convert_hf_to_gguf.txt
The upstream tokenizer configuration required a compatibility patch before conversion:
(Get-Content <model_dir>\tokenizer_config.json) `
-replace '"tokenizer_class": "TokenizersBackend"', '"tokenizer_class": "Qwen2Tokenizer"' `
| Set-Content <model_dir>\tokenizer_config.json
Then the model was converted using:
python convert_hf_to_gguf.py <model_dir> --outfile <output.gguf> --outtype bf16 --verbose
Important Details
- The source checkpoint uses compressed-tensors NVFP4.
- Conversion used the
pr-21095llama.cpp branch rather than ordinary master at the time of conversion. - The
--outtype bf16conversion path does not mean the entire release is a normal BF16 model. - NVFP4-compatible tensors are preserved or repacked where supported by the converter/runtime.
- Required auxiliary tensors may remain in floating-point format.
- The tokenizer compatibility patch changes only the tokenizer class metadata needed by the conversion stack.
Known Limitations
- This release depends on bleeding-edge llama.cpp NVFP4 conversion support.
- Multimodal + NVFP4 support is still evolving.
- LM Studio may not expose all advanced model controls for custom GGUF imports.
- Reasoning may be enabled by default without a visible UI toggle.
- Mixing the main model and projector from different GGUF/NVFP4 sources may introduce subtle incompatibilities.
- This is a community conversion and should be treated as experimental.
Usage with LM Studio
- Download/import the model folder in LM Studio.
- Ensure both files are present:
Qwen3.6-27B-NVFP4.ggufmmproj-BF16.gguf
- Load the main GGUF model.
- Image input should become available if LM Studio recognizes the projector correctly.
License & Disclaimer
This repository is a derivative conversion based on upstream model assets.
Please review and follow the licenses and usage terms of:
Users are responsible for ensuring compliance with all upstream model terms.
Acknowledgements
- Qwen for the original Qwen3.6-27B model
- Abiray for the NVFP4 checkpoint
- Unsloth for the Qwen3.6 GGUF release and vision projector
- llama.cpp contributors for GGUF and NVFP4 tooling
Notes
This release is intended for:
- Local experimentation
- LM Studio testing
- High-performance inference on RTX 5090 / Blackwell GPUs
It is not presented as an official or production-grade distribution.
- Downloads last month
- 2,973
4-bit
Model tree for Freenixi/Abiray-Qwen3.6-27B-NVFP4-GGUF
Base model
Qwen/Qwen3.6-27B