Tacotron-2 is a TensorFlow implementation of DeepMind’s Tacotron-2 end-to-end text-to-speech architecture, which predicts mel spectrograms from raw text and then feeds them to a neural vocoder such as WaveNet. It reproduces the original paper’s hyperparameters exactly via paper_hparams.py, while also offering a tuned hparams.py with extra improvements that often yield better audio quality in practice. The repository is structured as a full training pipeline: dataset preparation, preprocessing into spectrograms, Tacotron training, WaveNet (or Griffin-Lim) vocoder training, and final waveform synthesis. It includes directory layouts and logging directories for multiple datasets such as LJSpeech and M-AILABS en_US/en_UK, making it easier to adapt to new English corpora. Separate log trees track mel-spectrograms, attention plots, evaluation audio, and vocoder outputs, so you can inspect how alignment and audio quality evolve over time.

Features

  • Full TensorFlow implementation of Tacotron-2 with paper-accurate and enhanced hyperparameter sets
  • End-to-end pipeline from raw audio datasets (e.g., LJSpeech, M-AILABS) through preprocessing, Tacotron training, and vocoder training
  • Support for both WaveNet vocoder and Griffin-Lim inversion for mel-to-waveform synthesis
  • Detailed repository structure with logs for mel-spectrograms, attention plots, evaluation audio, and vocoder outputs
  • Modular training scripts (preprocess.py, train.py, synthesize.py, wavenet_preprocess.py) for flexible experimentation
  • Example configurations that replicate the original paper results and variants that push for improved stability and quality

Project Samples

Project Activity

See All Activity >

Categories

Text to Speech

License

MIT License

Follow Tacotron-2

Tacotron-2 Web Site

Other Useful Business Software
Enterprise-grade ITSM, for every business Icon
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
Try it Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Tacotron-2!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software

Registered

2025-11-28