The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
Features
- Generates high-quality speech from text and audio inputs.
- Uses a Llama backbone with an optimized audio decoder.
- Fine-tuned for interactive voice applications.
- Hosted models available for easy access and testing.
- Compatible with CUDA-enabled GPUs for fast performance.
- Easy to integrate and test using example scripts.
- Requires Python 3.10 and certain audio processing tools like ffmpeg.
- Customizable for various conversational contexts.
- Available under an Apache-2.0 license for open-source usage.
License
Apache License V2.0Follow CSM (Conversational Speech Model)
Other Useful Business Software
Keep company data safe with Chrome Enterprise
Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of CSM (Conversational Speech Model)!