| --- |
| pipeline_tag: text-to-video |
| license: other |
| license_name: tencent-hunyuan-community |
| license_link: LICENSE |
| --- |
| |
| <p align="center"> |
| <img src="assets/logo.jpg" height=30> |
| </p> |
|
|
| # FastHunyuan Model Card |
|
|
| ## Model Details |
|
|
| FastHunyuan is an accelerated [HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo) model. It can sample high quality videos with 6 diffusion steps. That brings around 8X speed up compared to the original HunyuanVideo with 50 steps. |
|
|
| - **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/) |
| - **License**: tencent-hunyuan-community |
| - **Distilled from**: [HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo) |
| - **Github Repository**: https://github.com/hao-ai-lab/FastVideo |
|
|
| ## Usage |
|
|
| - Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README. |
| - Alternatively, you can inference FastHunyuan using the official [Hunyuan Video repository](https://github.com/Tencent/HunyuanVideo) by **setting the shift to 17 and steps to 6, resolution to 720X1280X125, and cfg bigger than 6**. |
| We find that a large CFG scale generally leads to faster videos. |
|
|
| ## Training details |
|
|
| FastHunyuan is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters: |
| - Batch size: 16 |
| - Resulotion: 720x1280 |
| - Num of frames: 125 |
| - Train steps: 320 |
| - GPUs: 32 |
| - LR: 1e-6 |
| - Loss: huber |
|
|
| ## Evaluation |
| We provide some qualitative comparison between FastHunyuan 6 step inference v.s. the original Hunyuan with 6 step inference: |
|
|
| | FastHunyuan 6 step | Hunyuan 6 step | |
| | --- | --- | |
| |  |  | |
| |  |  | |
| |  |  | |
| |  |  | |
|
|
| ## Memory requirements |
|
|
| Please check our github repo for details. https://github.com/hao-ai-lab/FastVideo |
|
|
| For inference, we can inference FastHunyuan on single RTX4090. We now support NF4 and LLM-INT8 quantized inference using BitsAndBytes for FastHunyuan. With NF4 quantization, inference can be performed on a single RTX 4090 GPU, requiring just 20GB of VRAM. |
|
|
| For Lora Finetune, minimum hardware requirement |
| - 40 GB GPU memory each for 2 GPUs with lora |
| - 30 GB GPU memory each for 2 GPUs with CPU offload and lora. |
|
|