Audience
Developers and enterprise product teams that need expressive, multilingual, brand-safe text-to-speech for assistants, customer support, accessibility, education, and long-form audio experiences
About MAI-Voice-2
MAI-Voice-2 is Microsoft AI’s most expressive and natural-sounding text-to-speech model to date, built for production voice experiences where fidelity, language coverage, speaker consistency, and emotional range directly shape the user experience. It is designed for assistants, customer support, audiobooks, accessibility experiences, games, podcasts, courses, simulations, and creator workflows where voice quality must sound natural, fluid, and trustworthy. It expands from English-only support to 15 languages while maintaining naturalness and expressiveness, with support for English, Italian, French, German, Hindi, Spanish, Portuguese, Korean, Chinese, Turkish, Russian, Thai, Dutch, Romanian, and Hungarian. MAI-Voice-2 offers granular emotion control through tags such as sad, whispered, and excited, along with role-based expressive speech for experiences like motivational trainers, sports commentators, or character voices.