Audio & Speech Models - a adarshzolekar Collection

adarshzolekar 's Collections

Multimodal AI Models

Audio & Speech Models

Vision Models (Image & Video)

Text & Code Models (NLP)

Audio & Speech Models

updated Jan 23

Purpose: Speech recognition, text-to-speech, music, audio analysis.

openai/whisper-large-v3

Automatic Speech Recognition • Updated Aug 12, 2024 • 5.86M • • 5.45k
facebook/wav2vec2-base-960h

Automatic Speech Recognition • 94.4M • Updated Nov 14, 2022 • 3.5M • 391
coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 7.51M • 3.42k
microsoft/speecht5_tts

Text-to-Speech • Updated Nov 8, 2023 • 111k • 823
facebook/musicgen-small

Text-to-Audio • Updated Nov 17, 2023 • 153k • 480