Configuration Parsing Warning: In UNKNOWN_FILENAME: "diffusers._class_name" must be a string

Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

Kiwi-Edit is a versatile video editing framework built on an MLLM encoder and a video Diffusion Transformer (DiT). It supports:

  • Instruction Video Editing: Modify video content through text prompts.
  • Reference Image Guidance: Use a reference image to guide editing for higher visual fidelity and precise control.

The model synergizes learnable queries and latent visual features for reference semantic guidance, achieving significant gains in instruction following and reference fidelity.

Usage

To use Kiwi-Edit for inference, follow the installation instructions in the official repository. You can run a quick test on a demo video using the following command:

python diffusers_demo.py \
    --video_path ./demo_data/video/source/0005e4ad9f49814db1d3f2296b911abf.mp4 \
    --prompt "Remove the monkey." \
    --save_path output.mp4 --model_path linyq/kiwi-edit-5b-instruct-only-diffusers

Citation

If you use Kiwi-Edit in your research, please cite the following work:

@misc{kiwiedit,
      title={Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance}, 
      author={Yiqi Lin and Guoqiang Liang and Ziyun Zeng and Zechen Bai and Yanzhe Chen and Mike Zheng Shou},
      year={2026},
      eprint={2603.02175},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.02175}, 
}
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using linyq/kiwi-edit-5b-reference-only-diffusers 1

Collection including linyq/kiwi-edit-5b-reference-only-diffusers

Paper for linyq/kiwi-edit-5b-reference-only-diffusers