Soprano: Instant, Ultra‑Realistic Text‑to‑Speech
📰 News
2026.01.13 - Soprano-Factory released! You can now train/fine-tune your own Soprano models.
2025.12.22 - Soprano-80M released! Code | Demo
This repository contains Soprano-Encoder, which converts raw audio into audio tokens that the LLM backbone can recognize.
Overview
Soprano is an ultra‑lightweight, on-device text‑to‑speech (TTS) model designed for expressive, high‑fidelity speech synthesis at unprecedented speed. Soprano was designed with the following features:
- Up to 2000x real-time generation on GPU and 20x real-time on CPU
- Lossless streaming with <15 ms latency on GPU, <250 ms on CPU
- <1 GB memory usage with a compact 80M parameter architecture
- Infinite generation length with automatic text splitting
- Highly expressive, crystal clear audio generation at 32kHz
- Widespread support for CUDA, CPU, and MPS devices on Windows, Linux, and Mac
- Supports WebUI, CLI, and OpenAI-compatible endpoint for easy and production-ready inference
- Downloads last month
- 3