Improve model card: add project page, tags, and detailed description

595e6d9 verified about 2 months ago

2.18 kB

	---
	datasets:
	- LEMAS-Project/LEMAS-Dataset-train
	- LEMAS-Project/LEMAS-Dataset-eval
	language:
	- it
	- pt
	- es
	- fr
	- de
	- vi
	- id
	- ru
	- en
	- zh
	license: cc-by-nc-4.0
	pipeline_tag: text-to-speech
	tags:
	- zero-shot
	- multilingual
	---

	# LEMAS-TTS

	LEMAS-TTS is a multilingual zero-shot text-to-speech system, presented in the paper [LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models](https://huggingface.co/papers/2601.04233).

	- Project Page: [https://lemas-project.github.io/LEMAS-Project](https://lemas-project.github.io/LEMAS-Project)
	- Paper: [https://arxiv.org/abs/2601.04233](https://arxiv.org/abs/2601.04233)
	- GitHub Repository: [https://github.com/LEMAS-Project/LEMAS-TTS](https://github.com/LEMAS-Project/LEMAS-TTS)
	- Hugging Face Demo: [https://huggingface.co/spaces/LEMAS-Project/LEMAS-TTS](https://huggingface.co/spaces/LEMAS-Project/LEMAS-TTS)

	## Model Description

	LEMAS-TTS is built upon a non-autoregressive flow-matching framework. It leverages the massive scale and linguistic diversity of the LEMAS-Dataset to achieve robust zero-shot multilingual synthesis. The model incorporates accent-adversarial training and CTC loss to mitigate cross-lingual accent issues, enhancing synthesis stability and quality across diverse languages.

	## Supported Languages

	The model supports 10 major languages for zero-shot synthesis:
	- Chinese (zh)
	- English (en)
	- Spanish (es)
	- Russian (ru)
	- French (fr)
	- German (de)
	- Italian (it)
	- Portuguese (pt)
	- Indonesian (id)
	- Vietnamese (vi)

	## Training Data

	LEMAS-TTS was trained on the [LEMAS-Dataset](https://huggingface.co/datasets/LEMAS-Project/LEMAS-Dataset-train), which is, to our knowledge, currently the largest open-source multilingual speech corpus with word-level timestamps. It covers over 150,000 hours across 10 major languages.

	## Citation

	```bibtex
	@article{zhao2026lemas,
	title={LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models},
	author={Zhao, Zhiyuan and Lin, Lijian and Zhu, Ye and Xie, Kai and Liu, Yunfei and Li, Yu},
	journal={arXiv preprint arXiv:2601.04233},
	year={2026}
	}
	```