Voxtral Transcribe 2 by Mistral

What is Voxtral Transcribe 2?

Voxtral Transcribe 2 is a next-generation speech-to-text model family from Mistral, delivering ultra-fast, highly accurate transcription with real-time capabilities and speaker diarization. It includes two models: Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications. Together, they support 13 languages, word-level timestamps, context biasing, and privacy-first deployment—all at industry-leading speed and cost.

Who it's for

Voice app developers who need sub-200ms latency for real-time voice agents and interactive experiences.
Meeting and call processors who require accurate speaker diarization and word-level timestamps for multi-party transcription.
Privacy-conscious teams who want open-weights models deployable on edge devices for sensitive or offline use cases.

Key features

Voxtral Realtime

Purpose-built for live transcription, Voxtral Realtime uses a novel streaming architecture that transcribes audio as it arrives. It delivers configurable latency down to sub-200ms, enabling voice agents with near-offline accuracy. At 480ms delay, it stays within 1–2% word error rate, matching batch quality for real-time applications.

Voxtral Mini Transcribe V2

This batch model achieves state-of-the-art transcription quality at approximately 4% word error rate on the FLEURS benchmark and $0.003 per minute. It outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, and Deepgram Nova on accuracy, while processing audio about 3x faster than ElevenLabs’ Scribe v2 at one-fifth the cost.

Speaker diarization and context biasing

Generate transcriptions with speaker labels and precise start/end times, ideal for meetings, interviews, and multi-party calls. Context biasing lets you provide up to 100 words or phrases to guide the model toward correct spellings of names, technical terms, or domain-specific vocabulary.

Open-weights and multilingual support

Voxtral Realtime ships under the Apache 2.0 license, deployable on edge for privacy-first applications. Both models natively support 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch.

What stands out

Voxtral Transcribe 2 delivers the lowest word error rate at the lowest price point, with real-time latency down to sub-200ms.

This combination of accuracy, speed, and cost efficiency is unmatched in the current market. Voxtral Mini Transcribe V2 achieves state-of-the-art transcription at $0.003 per minute, while Voxtral Realtime enables a new class of voice-first applications with streaming architecture that doesn't compromise on quality. The open-weights release under Apache 2.0 further sets it apart, allowing privacy-sensitive deployments on edge devices.

Worth checking out if…

You need a speech-to-text solution that balances ultra-low latency, high accuracy, and cost-effectiveness—especially for real-time voice agents, live transcription, or privacy-first applications. The open-weights model and multilingual support make it a strong choice for developers building across platforms and languages.

What is Voxtral Transcribe 2?

Who it's for

Voice app developers who need sub-200ms latency for real-time voice agents and interactive experiences.
Meeting and call processors who require accurate speaker diarization and word-level timestamps for multi-party transcription.
Privacy-conscious teams who want open-weights models deployable on edge devices for sensitive or offline use cases.

Key features

Voxtral Realtime

Voxtral Mini Transcribe V2

Speaker diarization and context biasing

Open-weights and multilingual support

What stands out

Voxtral Transcribe 2 delivers the lowest word error rate at the lowest price point, with real-time latency down to sub-200ms.

Voxtral Transcribe 2 by Mistral

About Voxtral Transcribe 2 by Mistral

What is Voxtral Transcribe 2?

Who it's for

Key features

Voxtral Realtime

Voxtral Mini Transcribe V2

Speaker diarization and context biasing

Open-weights and multilingual support

What stands out

Worth checking out if…

Related products

Shadow

TranslateGemma

Mistral 3

Okara

Comments

About Voxtral Transcribe 2 by Mistral

What is Voxtral Transcribe 2?

Who it's for

Key features

Voxtral Realtime

Voxtral Mini Transcribe V2

Speaker diarization and context biasing

Open-weights and multilingual support

What stands out

Worth checking out if…

Related products

Shadow

TranslateGemma

Mistral 3

Okara