Miso TTS is an advanced 8-billion-parameter text-to-speech model developed by Miso Labs for generating highly expressive and natural-sounding conversational speech. Built on an RVQ Transformer architecture inspired by Sesame CSM, it combines a powerful Llama-based backbone with an autoregressive audio decoder to produce high-quality audio from text. The model supports both standard speech synthesis and voice-conditioned generation using optional audio prompts for voice cloning. Miso TTS generates Mimi audio codes and can leverage conversation history to create more contextually aware and realistic dialogue. Designed for local deployment, it offers watermarking by default to help promote responsible use of generated audio. With its focus on emotive speech generation, Miso TTS delivers state-of-the-art performance for AI voice applications, virtual assistants, and conversational AI experiences.

Features

  • High-Quality Speech Synthesis – Generates natural, expressive, and emotionally rich speech from text input.
  • Voice Cloning Support – Uses optional audio prompts and transcripts to create speech that matches a specific voice.
  • Advanced RVQ Transformer Architecture – Combines an 8B-parameter backbone with a dedicated audio decoder for realistic audio generation.
  • Context-Aware Dialogue Generation – Supports conditioning on previous conversation history for more coherent and conversational outputs.
  • Built-In Audio Watermarking – Applies watermarking to generated audio by default to encourage responsible deployment and content attribution.
  • Local & GPU-Accelerated Deployment – Runs locally with Hugging Face-hosted model weights and optimized CUDA-based inference for high-performance generation.

Project Activity

See All Activity >

License

MIT License

Follow Miso TTS

Miso TTS Web Site

Other Useful Business Software
Build Agents and Models on One Platform Icon
Build Agents and Models on One Platform

Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
Try It Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Miso TTS!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Text to Speech Software, Python AI Models

Registered

1 day ago