WhisperSpeech is an open-source text-to-speech system created by “inverting” OpenAI’s Whisper, reusing its strengths as a semantic audio model to generate speech instead of only transcribing it. The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data. Its architecture follows a token-based, multi-stage pipeline inspired by AudioLM and SPEAR-TTS: Whisper is used to produce semantic tokens, EnCodec compresses the waveform into acoustic tokens, and Vocos reconstructs high-fidelity audio from those tokens. The repository includes notebooks and scripts for inference, long-form synthesis, and finetuning, as well as pre-trained models and converted datasets hosted on Hugging Face. Performance optimizations like torch.compile, KV-caching, and architectural tweaks allow the main model to reach up to 12× real-time speed on a consumer RTX 4090.

Features

  • Text-to-speech system built by inverting Whisper into a semantic token generator
  • Three-stage pipeline using Whisper (semantic), EnCodec (acoustic tokens), and Vocos (vocoder)
  • Open-source code under Apache-2.0/MIT with models trained on properly licensed datasets
  • High-performance inference with optimizations like torch.compile and KV-caching for 10×+ real-time speed on GPUs
  • Support for voice cloning, multilingual experiments, and code-switching within a single utterance
  • Notebooks and scripts for long-form generation, finetuning, and community-driven benchmarking

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow WhisperSpeech

WhisperSpeech Web Site

Other Useful Business Software
$300 Free Credits for Your Google Cloud Projects Icon
$300 Free Credits for Your Google Cloud Projects

Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
Start Free Trial
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of WhisperSpeech!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software, Python Text-to-Speech (TTS) Models

Registered

2025-11-28