Miso TTS

Miso TTS is an advanced 8-billion-parameter text-to-speech model developed by Miso Labs for generating highly expressive and natural-sounding conversational speech. Built on an RVQ Transformer architecture inspired by Sesame CSM, it combines a powerful Llama-based backbone with an autoregressive audio decoder to produce high-quality audio from text. The model supports both standard speech synthesis and voice-conditioned generation using optional audio prompts for voice cloning. Miso TTS generates Mimi audio codes and can leverage conversation history to create more contextually aware and realistic dialogue. Designed for local deployment, it offers watermarking by default to help promote responsible use of generated audio. With its focus on emotive speech generation, Miso TTS delivers state-of-the-art performance for AI voice applications, virtual assistants, and conversational AI experiences.

Features

High-Quality Speech Synthesis – Generates natural, expressive, and emotionally rich speech from text input.
Voice Cloning Support – Uses optional audio prompts and transcripts to create speech that matches a specific voice.
Advanced RVQ Transformer Architecture – Combines an 8B-parameter backbone with a dedicated audio decoder for realistic audio generation.
Context-Aware Dialogue Generation – Supports conditioning on previous conversation history for more coherent and conversational outputs.
Built-In Audio Watermarking – Applies watermarking to generated audio by default to encourage responsible deployment and content attribution.
Local & GPU-Accelerated Deployment – Runs locally with Hugging Face-hosted model weights and optimized CUDA-based inference for high-performance generation.

Project Activity

See All Activity >

License

MIT License

Follow Miso TTS

Miso TTS Web Site

Other Useful Business Software

Cut Data Warehouse Costs by 54%

Easily migrate from Snowflake, Redshift, or Databricks with free tools.

BigQuery delivers 54% lower TCO with exabyte scale and flexible pricing. Free migration tools handle the SQL translation automatically.

Try Free

Rate This Project

User Reviews

Be the first to post a review of Miso TTS!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Text to Speech Software, Python AI Models, Python Text-to-Speech (TTS) Models

Registered

2026-06-04

Similar Business Software

Adobe Firefly

Adobe Firefly is an AI-powered creative platform that enables users to generate and edit images, videos, and other media using simple text prompts. It provides an intuitive workspace where users can create content on an infinite canvas and experiment with different creative ideas. The platform...

See Software
Google AI Studio

Google AI Studio is a unified development platform that helps teams explore, build, and deploy applications using Google’s most advanced AI models, including Gemini 3.5. It brings text, image, audio, and video models together in one interactive playground. With vibe coding, developers can use...

See Software
LM-Kit.NET

LM-Kit.NET is a complete local AI runtime for .NET that lets engineering teams ship AI-powered features without cloud dependencies, per-token costs, or data leaving the network. Most .NET AI integrations stop at inference. LM-Kit.NET covers the full range of capabilities production...

See Software
Gemini Enterprise Agent Platform

Gemini Enterprise Agent Platform is a comprehensive solution from Google Cloud designed to help organizations build, scale, govern, and optimize AI agents. It represents the evolution of Vertex AI, combining advanced model development with new capabilities for agent orchestration and...

See Software
Cartesia Sonic-3

Cartesia Sonic-3 is a real-time, streaming text-to-speech (TTS) model designed to generate ultra-realistic, expressive voice output with extremely low latency, enabling AI systems to speak as fluidly as humans in live interactions. Built on advanced state space model architecture, Sonic delivers...

See Software
Murf AI

Murf AI is a text-to-speech and AI voice generation platform designed to create realistic voiceovers quickly and efficiently. It allows users to convert text into natural-sounding speech using a wide range of voices and languages. The platform includes a studio environment where users can...

See Software