Soprano: Instant, Ultra‑Realistic Text‑to‑Speech

soprano-github

Alt Text Alt Text

📰 News

2026.01.13 - Soprano-Factory released! You can now train/fine-tune your own Soprano models.
2025.12.22 - Soprano-80M released! Code | Demo


This repository contains Soprano-Encoder, which converts raw audio into audio tokens that the LLM backbone can recognize.

Overview

Soprano is an ultra‑lightweight, on-device text‑to‑speech (TTS) model designed for expressive, high‑fidelity speech synthesis at unprecedented speed. Soprano was designed with the following features:

  • Up to 2000x real-time generation on GPU and 20x real-time on CPU
  • Lossless streaming with <15 ms latency on GPU, <250 ms on CPU
  • <1 GB memory usage with a compact 80M parameter architecture
  • Infinite generation length with automatic text splitting
  • Highly expressive, crystal clear audio generation at 32kHz
  • Widespread support for CUDA, CPU, and MPS devices on Windows, Linux, and Mac
  • Supports WebUI, CLI, and OpenAI-compatible endpoint for easy and production-ready inference

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support