HiFi-GAN is a GAN-based neural vocoder designed to generate high-fidelity speech waveforms from mel spectrograms with exceptional efficiency. It introduces a generator architecture tailored to model the periodic structure of speech and a set of discriminators that focus on different scales and periods of the waveform to better capture naturalness. The model targets a sweet spot between sample quality and generation speed, outperforming many previous GAN vocoders while being far faster than typical autoregressive models. In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168× faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13× faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.

Features

  • High-fidelity neural vocoder that converts mel spectrograms to waveforms using a GAN architecture
  • Multi-period and multi-scale discriminators to better capture periodicity and overall speech realism
  • Very fast inference, achieving far faster-than-real-time generation on modern GPUs and even optimized CPU setups
  • Multiple generator configurations (v1, v2, v3) to balance quality, speed, and model size
  • Compatible with many TTS front ends such as Tacotron2 and Glow-TTS for end-to-end systems
  • Open-source implementation with pretrained models and scripts for training, evaluation, and inference

Project Samples

Project Activity

See All Activity >

Categories

Text to Speech

License

MIT License

Follow HiFi-GAN

HiFi-GAN Web Site

Other Useful Business Software
Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
Sign Up Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of HiFi-GAN!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software

Registered

2025-11-28