The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.

Features

  • Generates high-quality speech from text and audio inputs.
  • Uses a Llama backbone with an optimized audio decoder.
  • Fine-tuned for interactive voice applications.
  • Hosted models available for easy access and testing.
  • Compatible with CUDA-enabled GPUs for fast performance.
  • Easy to integrate and test using example scripts.
  • Requires Python 3.10 and certain audio processing tools like ffmpeg.
  • Customizable for various conversational contexts.
  • Available under an Apache-2.0 license for open-source usage.

Project Activity

See All Activity >

License

Apache License V2.0

Follow CSM (Conversational Speech Model)

CSM (Conversational Speech Model) Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of CSM (Conversational Speech Model)!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software, Python AI Models

Registered

2025-03-19