VITS is a foundational research implementation of “VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech,” a well-known neural TTS architecture. Unlike traditional two-stage systems that separately train an acoustic model and a vocoder, VITS trains an end-to-end model that maps text directly to waveform using a conditional variational autoencoder combined with normalizing flows and adversarial training. This architecture enables parallel generation (fast inference) while achieving speech quality that rivals or surpasses many two-stage systems. The repository provides training and inference pipelines for common datasets such as LJ Speech (single-speaker) and VCTK (multi-speaker), including filelists, configs, and preprocessing scripts. It also includes monotonic alignment search code and g2p preprocessing, which are crucial components for aligning text and speech in an end-to-end setup.

Features

  • End-to-end TTS model combining conditional VAE, normalizing flows, and adversarial training
  • Parallel waveform generation with high naturalness compared to classic two-stage pipelines
  • Ready-made training recipes for LJ Speech and VCTK datasets (single and multi-speaker)
  • Monotonic alignment search implementation and phoneme preprocessing scripts
  • PyTorch-based code suitable for research, modification, and experimental extensions
  • Widely adopted baseline architecture for many derivative and improved TTS systems

Project Samples

Project Activity

See All Activity >

Categories

Text to Speech

License

MIT License

Follow VITS

VITS Web Site

Other Useful Business Software
Go From AI Idea to AI App Fast Icon
Go From AI Idea to AI App Fast

One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Try Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of VITS!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software

Registered

2025-11-28