MixDQ Model Card
Model Description
MixDQ is a mixed precision quantization methods that compress the memory and computational usage of text-to-image diffusion models while preserving genration quality.
It supports few-step diffusion models (e.g., SDXL-turbo, LCM-lora) to construct both fast and tiny diffusion models. Efficient CUDA kernel implemention is provided for practical resource savings.
Model Sources
for more information, please refer to:
Evaluation
We evaluate the MixDQ model using various metrics, including FID (fidelity), CLIPScore (image-text alignment), and ImageReward (human preference). MixDQ can achieve W8A8 quantization without performance loss. The differences between images generated by MixDQ and those generated by FP16 models are negligible.
| Method |
FID (↓) |
ClipScore |
ImageReward |
| FP16 |
17.15 |
0.2722 |
0.8631 |
| MixDQ-W8A8 |
17.03 |
0.2703 |
0.8415 |
| MixDQ-W5A8 |
17.23 |
0.2697 |
0.8307 |
Usage
install the prerequisite for Mixdq:
# The Python versions required to run mixdq: 3.8, 3.9, 3.10
pip install -i https://pypi.org/simple/ mixdq-extension
run the pipeline:
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/sdxl-turbo", custom_pipeline="nics-efc/MixDQ",
torch_dtype=torch.float16, variant="fp16"
)
pipe.quantize_unet(
w_bit = 8,
a_bit = 8,
bos=True,
)
pipe.set_cuda_graph(
run_pipeline = True,
)
pipe.run_for_test(
device="cuda",
output_type="pil",
run_pipeline=True,
path="pipeline_test.png",
profile=True
)
'''
After execution is finished, there will be a report under log/sdxl folder in formats of json.
This report can be opened by tensorboard for users to examine profiling results:
tensorboard --logdir=./log
'''
pipe = pipe.to("cuda")
prompts = "A black Honda motorcycle parked in front of a garage."
image = pipe(prompts, num_inference_steps=1, guidance_scale=0.0).images[0]
image.save('mixdq_pipeline.png')
Performance tested on NVIDIA 4080:
| UNet Latency (ms) |
No CUDA Graph |
With CUDA Graph |
| FP16 version |
44.6 |
36.1 |
| Quantized version |
59.1 |
24.9 |
| Speedup |
0.75 |
1.45 |