Qwen3.6-25B-A3B
This is a compressed checkpoint derived from Qwen/Qwen3.6-35B-A3B.
Overview
- Base model:
Qwen/Qwen3.6-35B-A3B - Total parameter count:
34.66B -> 24.97B - Layers:
40 - Active experts per token:
8 - Format: standard Transformers safetensor shards with tokenizer, generation config, and chat template included
The repo is set up for direct from_pretrained(...) loading.
Quick start
Use a recent Transformers release that supports Qwen3.6 MoE. This checkpoint was produced and validated with transformers==5.5.4.
pip install -U torch transformers==5.5.4 accelerate torchvision pillow
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "Jaso1024/Qwen3.6-25B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
attn_implementation="sdpa",
)
messages = [{"role": "user", "content": "Solve: If 3 notebooks cost $12, how much do 8 cost? End with ####."}]
try:
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False, enable_thinking=False)
except TypeError:
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Notes
- The uploaded
config.jsonalready reflects the compressed architecture. - This is still a large bf16 checkpoint, so practical inference typically needs high-memory GPU hardware or multi-device offload.
- License and usage terms should be treated as inherited from the upstream base model.
- Downloads last month
- 740
Model tree for Jaso1024/Qwen3.6-25B-A3B
Base model
Qwen/Qwen3.6-35B-A3B