#1 Mistral Launches Voxtral TTS: Bridging the ‘Expressivity Gap’ in Multilingual Voice Cloning
Mistral AI has released Voxtral TTS, its first text-to-speech model, aiming to close the ‘Expressivity Gap’ in voice cloning. The model employs a hybrid architecture combining autoregressive generation and flow-matching, totaling approximately 4 billion parameters. Voxtral TTS can generate natural, speaker-faithful speech in 9 languages from as little as 3 seconds of reference audio, outperforms ElevenLabs in multilingual voice cloning evaluations, and offers low latency.