AutoRound quant fails to load with mlx-lm

#1
by smcleod - opened

FYI - your mixed-bit MLX AutoRound quants of Qwen3.5/3.6 models including this repo currently fail to load with the latest stock mlx-lm 0.31.3 with a shape mismatch on the first non-default-bit tensor.

I believe the bug is upstream, not in your quants: qwen3_5.py:Model.sanitize remaps weight key prefixes (model.language_model.Xlanguage_model.model.X) but not the matching keys in config["quantization"], so per-tensor bit-width overrides miss and the global default gets applied.

Filed at ml-explore/mlx-lm#1214, just a heads-up in case users start reporting load failures.

Thanks for the feedback. Please feel free to open an issue or PR if there’s anything AutoRound can help with.

Sign up or log in to comment