The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.

Features

  • Generates high-quality speech from text and audio inputs.
  • Uses a Llama backbone with an optimized audio decoder.
  • Fine-tuned for interactive voice applications.
  • Hosted models available for easy access and testing.
  • Compatible with CUDA-enabled GPUs for fast performance.
  • Easy to integrate and test using example scripts.
  • Requires Python 3.10 and certain audio processing tools like ffmpeg.
  • Customizable for various conversational contexts.
  • Available under an Apache-2.0 license for open-source usage.

Project Activity

See All Activity >

License

Apache License V2.0

Follow CSM (Conversational Speech Model)

CSM (Conversational Speech Model) Web Site

Other Useful Business Software
Our Free Plans just got better! | Auth0 Icon
Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of CSM (Conversational Speech Model)!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software, Python AI Models

Registered

2025-03-19