Octave TTS
Hume AI has introduced Octave (Omni-capable Text and Voice Engine), a groundbreaking text-to-speech system that leverages large language model technology to understand and interpret the context of words, enabling it to generate speech with appropriate emotions, rhythm, and cadence, unlike traditional TTS models that merely read text, Octave acts akin to a human actor, delivering lines with nuanced expression based on the content. Users can create diverse AI voices by providing descriptive prompts, such as "a sarcastic medieval peasant," allowing for tailored voice generation that aligns with specific character traits or scenarios. Additionally, Octave offers the flexibility to modify the emotional delivery and speaking style through natural language instructions, enabling commands like "sound more enthusiastic" or "whisper fearfully" to fine-tune the output.
Learn more
Gemini 2.5 Flash TTS
Gemini 2.5 Flash TTS is the latest text-to-speech (TTS) model variant in Google’s Gemini 2.5 lineup, designed for faster, low-latency speech synthesis with expressive, controllable audio output. It offers significant enhancements in tone versatility and expressivity so that developers can generate speech that better matches style prompts, from storytelling narrations to character voices, with more natural emotional range. It features precision pacing, which allows it to adjust speech tempo based on context, delivering faster sections or slowing for emphasis more accurately according to instructions. It also supports multi-speaker dialogues with consistent character voices for scenarios like podcasts, interviews, or conversational agents, and improved multilingual handling so each speaker’s unique tone and style persist across languages. Gemini 2.5 Flash TTS is optimized for lower latency, making it ideal for interactive applications and real-time voice interfaces.
Learn more
Azure AI Speech
Build voice-enabled apps confidently and quickly with the Speech SDK. Transcribe speech to text with high accuracy, produce natural-sounding text-to-speech voices, translate spoken audio, and use speaker recognition during conversations. Create custom models tailored to your app with Speech studio. Get state-of-the-art speech to text, lifelike text to speech, and award-winning speaker recognition. Your data stays yours, your speech input is not logged during processing. Create custom voices, add specific words to your base vocabulary, or build your own models. Run Speech anywhere, in the cloud or at the edge in containers. Quickly and accurately transcribe audio in more than 92 languages and variants. Gain customer insights with call center transcription, improve experiences with voice-enabled assistants, capture key discussions in meetings and more. Use text to speech to create apps and services that speak conversationally, choosing from more than 215 voices, and 60 languages.
Learn more
Amazon Polly
Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Polly's Text-to-Speech (TTS) service uses advanced deep learning technologies to synthesize natural sounding human speech. With dozens of lifelike voices across a broad set of languages, you can build speech-enabled applications that work in many different countries.
In addition to Standard TTS voices, Amazon Polly offers Neural Text-to-Speech (NTTS) voices that deliver advanced improvements in speech quality through a new machine learning approach. Polly’s Neural TTS technology also supports two speaking styles that allow you to better match the delivery style of the speaker to the application: a Newscaster reading style that is tailored to news narration use cases, and a Conversational speaking style that is ideal for two-way communication like telephony applications.
Learn more