Soprano: Instant, Ultra‑Realistic Text‑to‑Speech

📰 News

2026.01.13 - Soprano-Factory released! You can now train/fine-tune your own Soprano models.
2025.12.22 - Soprano-80M released! Code | Demo

This repository contains Soprano-Encoder, which converts raw audio into audio tokens that the LLM backbone can recognize.

Overview

Soprano is an ultra‑lightweight, on-device text‑to‑speech (TTS) model designed for expressive, high‑fidelity speech synthesis at unprecedented speed. Soprano was designed with the following features:

Up to 2000x real-time generation on GPU and 20x real-time on CPU
Lossless streaming with <15 ms latency on GPU, <250 ms on CPU
<1 GB memory usage with a compact 80M parameter architecture
Infinite generation length with automatic text splitting
Highly expressive, crystal clear audio generation at 32kHz
Widespread support for CUDA, CPU, and MPS devices on Windows, Linux, and Mac
Supports WebUI, CLI, and OpenAI-compatible endpoint for easy and production-ready inference

Downloads last month: 9