Token-Based Audio Inpainting via Discrete Diffusion (AIDD)

Pretrained model weights for AIDD, introduced in:

Token-Based Audio Inpainting via Discrete Diffusion
ICLR 2026
https://arxiv.org/abs/2507.08333

AIDD performs audio inpainting by applying diffusion in a discrete token space, enabling semantically coherent reconstruction of missing audio segments, including long gaps of up to 750 ms.

Model

The model operates on discrete audio tokens produced by a pretrained WavTokenizer and performs inpainting using a Diffusion Transformer (DiT) trained with a discrete diffusion objective. The training incorporates span-based masking to model structured missing regions and a derivative-based regularization loss that encourages smooth temporal dynamics in token embedding space. The model is designed for restoring missing segments in musical audio, including long gaps.

Usage

This repository provides model weights only.
For code, see the official GitHub repository:

👉 https://github.com/iftachShoham/AIDD

Data & Evaluation

Trained and evaluated on MusicNet and MAESTRO, using FAD, LSD, ODG, and MOS metrics.
See the paper for full details.

Acknowledgments

Built upon
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution and
WavTokenizer: An Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
We thank the authors for making their work publicly available.

Citation

@article{dror2025token,
  title={Token-based Audio Inpainting via Discrete Diffusion},
  author={Dror, Tali and Shoham, Iftach and Buchris, Moshe and Gal, Oren and Permuter, Haim and Katz, Gilad and Nachmani, Eliya},
  journal={arXiv preprint arXiv:2507.08333},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for TaliDror/AIDD

Token-based Audio Inpainting via Discrete Diffusion

Paper • 2507.08333 • Published Jul 11, 2025 • 5