VoxCPM2 is an advanced open-source text-to-speech system that redefines speech synthesis by eliminating traditional tokenization and instead generating continuous speech representations through a diffusion-based autoregressive architecture. Built on top of the MiniCPM model family, it enables highly natural, expressive, and context-aware speech generation that adapts tone, emotion, and pacing directly from input text. The system is trained on massive multilingual datasets, enabling support for dozens of languages and dialects while maintaining high fidelity and realism in generated audio. VoxCPM stands out for its ability to perform voice cloning with minimal input, capturing not only the speaker’s timbre but also nuanced features such as rhythm, accent, and emotional delivery. It also introduces voice design capabilities, allowing users to generate entirely new voices from natural language descriptions without requiring reference audio.

Features

  • Tokenizer-free speech generation using diffusion autoregressive modeling
  • Multilingual support across dozens of languages without explicit tagging
  • High-quality voice cloning from short reference audio samples
  • Voice design from natural language descriptions without audio input
  • Real-time streaming synthesis with low latency performance
  • Studio-quality audio output with built-in super-resolution

Project Samples

Project Activity

See All Activity >

Categories

Text to Speech

License

Apache License V2.0

Follow VoxCPM2

VoxCPM2 Web Site

Other Useful Business Software
Earn up to 16% annual interest with Nexo. Icon
Earn up to 16% annual interest with Nexo.

Let your crypto work for you

Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
Get started with Nexo.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of VoxCPM2!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software

Registered

8 hours ago