CSM (Conversational Speech Model) download

The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.

Features

Generates high-quality speech from text and audio inputs.
Uses a Llama backbone with an optimized audio decoder.
Fine-tuned for interactive voice applications.
Hosted models available for easy access and testing.
Compatible with CUDA-enabled GPUs for fast performance.
Easy to integrate and test using example scripts.
Requires Python 3.10 and certain audio processing tools like ffmpeg.
Customizable for various conversational contexts.
Available under an Apache-2.0 license for open-source usage.

Project Activity

See All Activity >

License

Apache License V2.0

Follow CSM (Conversational Speech Model)

CSM (Conversational Speech Model) Web Site

Other Useful Business Software

$300 in Free Credit Towards Top Cloud Services

Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started

Rate This Project

User Reviews

Be the first to post a review of CSM (Conversational Speech Model)!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software, Python AI Models

Registered

2025-03-19

Similar Business Software

Google AI Studio

Google AI Studio is a unified development platform that helps teams explore, build, and deploy applications using Google’s most advanced AI models, including Gemini 3. It brings text, image, audio, and video models together in one interactive playground. With vibe coding, developers can use...

See Software
Gemini Enterprise Agent Platform

Gemini Enterprise Agent Platform is a comprehensive solution from Google Cloud designed to help organizations build, scale, govern, and optimize AI agents. It represents the evolution of Vertex AI, combining advanced model development with new capabilities for agent orchestration and...

See Software
LM-Kit.NET

LM-Kit.NET is a cutting-edge, high-level inference SDK designed specifically to bring the advanced capabilities of Large Language Models (LLM) into the C# ecosystem. Tailored for developers working within .NET, LM-Kit.NET provides a comprehensive suite of powerful Generative AI tools, making...

See Software
Murf AI

Murf API is an advanced text-to-speech (TTS) solution that transforms written text into natural, lifelike voiceovers with remarkable accuracy and ease. It empowers developers and businesses with a suite of sophisticated features, including pitch and speed modulation, audio duration adjustments,...

See Software
Chatterbox

Chatterbox is a free, open source voice cloning AI model developed by Resemble AI, licensed under MIT. It enables zero-shot voice cloning using just 5 seconds of reference audio, eliminating the need for training. The model offers expressive speech synthesis with unique emotion control, allowing...

See Software
ElevenLabs

The most realistic and versatile AI speech software, ever. Eleven brings the most compelling, rich and lifelike voices to creators and publishers seeking the ultimate tools for storytelling. Generate top-quality spoken audio in any voice and style with the most advanced and multipurpose AI...

See Software