SpidR MMS-ulab

SpidR MMS-ulab is a SpidR model pretrained on the Segmented MMS ulab v2 dataset for the DiscoPhon benchmark. It was pretrained using the spidr library.

You can load it with:

from spidr.models import SpidR
from torch.hub import load_state_dict_from_url

state_dict = load_state_dict_from_url("https://huggingface.co/coml/spidr-mmsulab/resolve/main/final.pt")
model = SpidR().eval()
model.load_state_dict(state_dict)

Files:

config.json: Model configuration.
final.pt: Model checkpoint.
full_checkpoint.pt: Full checkpoint, with model, optimizer, etc.

Citing

Please cite the DiscoPhon paper

@misc{poli2026discophon,
  title={{DiscoPhon}: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units},
  author={Maxime Poli and Manel Khentout and Angelo Ortiz Tandazo and Ewan Dunbar and Emmanuel Chemla and Emmanuel Dupoux},
  year={2026},
  eprint={2603.18612},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.18612},
}

along with SpidR.

Downloads last month: 8

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train coml/spidr-mmsulab

Collection including coml/spidr-mmsulab

DiscoPhon

Collection

5 items • Updated 2 days ago

Paper for coml/spidr-mmsulab

DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units

Paper • 2603.18612 • Published Mar 19