PrismAudio Models (SafeTensors Mirror)

Mirrored and converted from FunAudioLLM/PrismAudio.

All weights have been converted from PyTorch .ckpt/.pth to SafeTensors format for:

✅ Faster loading
✅ Memory-mapped I/O
✅ No arbitrary code execution risk

Files

File	Description
`prismaudio.safetensors`	Main PrismAudio model weights (518M params)
`synchformer_state_dict.safetensors`	Synchformer temporal alignment encoder
`vae.safetensors`	Oobleck VAE decoder

Usage

These weights are used by the MAESTRO AI Workstation's PrismAudio panel for decomposed Chain-of-Thought video-to-audio generation.

Citation

@misc{liu2025thinksound,
  title={ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing},
  author={Huadai Liu and Jialei Wang and Kaicheng Luo and Wen Wang and Qian Chen and Zhou Zhao and Wei Xue},
  year={2025},
  eprint={2506.21448},
  archivePrefix={arXiv},
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for AEmotionStudio/prismaudio-models

Unable to build the model tree, the base model loops to the model itself. Learn more.

Paper for AEmotionStudio/prismaudio-models

ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing

Paper • 2506.21448 • Published Jun 26, 2025 • 9