AutoRound quant fails to load with mlx-lm

by smcleod - opened 1 day ago

FYI - your mixed-bit MLX AutoRound quants of Qwen3.5/3.6 models including this repo currently fail to load with the latest stock mlx-lm 0.31.3 with a shape mismatch on the first non-default-bit tensor.

I believe the bug is upstream, not in your quants: qwen3_5.py:Model.sanitize remaps weight key prefixes (model.language_model.X → language_model.model.X) but not the matching keys in config["quantization"], so per-tensor bit-width overrides miss and the global default gets applied.

Filed at ml-explore/mlx-lm#1214, just a heads-up in case users start reporting load failures.

wenhuach

Intel org about 23 hours ago

•

edited about 23 hours ago

Thanks for the feedback. Please feel free to open an issue or PR if there’s anything AutoRound can help with.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment