DiscoPhon
Collection
5 items • Updated
SpidR MMS-ulab is a SpidR model pretrained on the Segmented MMS ulab v2 dataset
for the DiscoPhon benchmark.
It was pretrained using the spidr library.
You can load it with:
from spidr.models import SpidR
from torch.hub import load_state_dict_from_url
state_dict = load_state_dict_from_url("https://huggingface.co/coml/spidr-mmsulab/resolve/main/final.pt")
model = SpidR().eval()
model.load_state_dict(state_dict)
config.json: Model configuration.final.pt: Model checkpoint.full_checkpoint.pt: Full checkpoint, with model, optimizer, etc.Please cite the DiscoPhon paper
@misc{poli2026discophon,
title={{DiscoPhon}: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units},
author={Maxime Poli and Manel Khentout and Angelo Ortiz Tandazo and Ewan Dunbar and Emmanuel Chemla and Emmanuel Dupoux},
year={2026},
eprint={2603.18612},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.18612},
}
along with SpidR.