real-time vocoder free download

Showing 16 open source projects for "real-time vocoder"

View related business solutions

Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

Real-Time Voice Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder.

Downloads: 11 This Week

Last Update: 2026-03-09
See Project
2

WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper

WhisperSpeech is an open-source text-to-speech system created by “inverting” OpenAI’s Whisper, reusing its strengths as a semantic audio model to generate speech instead of only transcribing it. The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data. Its architecture follows a token-based, multi-stage pipeline inspired by AudioLM and SPEAR-TTS:...

Downloads: 2 This Week

Last Update: 2025-11-28
See Project
3

Nyquist

Nyquist is a language for sound synthesis and music composition.

Nyquist is a language for sound synthesis and music composition. It is implemented in C and C++ and runs on Win32, OSX, and Linux. Nyquist combines a powerful functional programming style with efficient signal-processing primitives. Nyquist is also embedded as a scripting language in Audacity.

3 Reviews

Downloads: 54 This Week

Last Update: 2025-03-31
See Project
4

Parallel WaveGAN

Unofficial Parallel WaveGAN

Parallel WaveGAN is an unofficial PyTorch implementation of several state-of-the-art non-autoregressive neural vocoders, centered on Parallel WaveGAN but also including MelGAN, Multiband-MelGAN, HiFi-GAN, and StyleMelGAN. Its main goal is to provide a real-time neural vocoder that can turn mel spectrograms into high-quality speech audio efficiently. The repository is designed to work hand-in-hand with ESPnet-TTS and NVIDIA Tacotron2-style front ends, so you can build complete TTS or singing voice synthesis pipelines. It includes a large collection of “Kaldi-style” recipes for many datasets such as LJSpeech, LibriTTS, VCTK, JSUT, CMU Arctic, and multiple singing voice corpora in Japanese, Mandarin, Korean, and more. ...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

Lyra

A Very Low-Bitrate Codec for Speech Compression

lyra is a neural audio codec designed to deliver intelligible, natural-sounding speech at extremely low bitrates, making real-time communication viable on constrained networks. It replaces hand-engineered codecs with learned models that capture speech characteristics more efficiently and reconstruct waveforms with a neural vocoder. The system targets mobile-class hardware, balancing latency and quality so it can run in real-time on phones. Its architecture is resilient to packet loss and jitter through framing strategies and error concealment, helping conversations remain understandable under adverse conditions. ...

Downloads: 0 This Week

Last Update: 2025-10-09
See Project
6

radio_vocoder_FFT

a vocoder + equalizer + FFT effects version of radio_chung

radio vocoder chung is a vocoder + linear equalizer(s) + FFT effect(s) version of radio chung free internet web radio stream url and audio file generic path player ( * ,mp3,*name*.ogg,wav,...) with dsp(s) (baxandall , resonance , automod , decay , flat , noisered , speed , feedback ) using bass.dll , gui_chung , FFTdll.dll fft fast fourier transform and freebasic .high quality small pitch shift shifting for radio url . added record, playrec, save as MP3 , feedback , anticlick .

Downloads: 2 This Week

Last Update: 2022-01-19
See Project
7

Mocking Bird

Clone a voice in 5 seconds to generate arbitrary speech in real-time

MockingBird is an open-source voice cloning and real-time speech generation toolkit that lets you clone a speaker’s voice from a short audio sample (reportedly as little as 5 seconds) and then synthesize arbitrary speech in that voice. It builds on deep-learning based TTS / voice-cloning technology (in the lineage of projects such as Real-Time-Voice-Cloning), but extends it with support for Mandarin Chinese and multiple Chinese speech datasets — broadening its applicability beyond English. ...

1 Review

Downloads: 3 This Week

Last Update: 2023-03-23
See Project
8

VoiceFixer

General Speech Restoration

VoiceFixer is a machine-learning framework for “speech restoration”: given a degraded or distorted audio recording — with noise, clipping, low sampling rate, reverberation, or other artifacts — it attempts to recover high-fidelity, clean speech. The architecture works in two stages: first an analysis stage that tries to extract “clean” intermediate features from the noisy audio (e.g. removing noise, denoising, dereverberation, upsampling), and then a neural vocoder-based synthesis stage that...

Downloads: 7 This Week

Last Update: 2025-11-28
See Project
9

TensorFlowTTS

Real-Time State-of-the-art Speech Synthesis for Tensorflow 2

...It offers a variety of architectures for text-to-speech, including classic and modern models such as Tacotron‑2, FastSpeech / FastSpeech2, and neural vocoders like MelGAN and Multiband‑MelGAN. Because it’s based on TensorFlow 2, it can leverage optimizations such as fake-quantization aware training and pruning — which allow models to run faster than real time and to be deployable on mobile or embedded platforms. The library supports multiple languages (English, French, Korean, Chinese, German, etc.) and is relatively easy to adapt to new languages. With integrated vocoder + mel-spectrogram generation pipelines, pre-trained models, and fairly flexible architecture, TensorFlowTTS is a great off-the-shelf and extensible TTS engine for applications ranging from voice assistants to content generation or accessibility tools.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
10

HiFi-GAN

Generative Adversarial Networks for Efficient and High Fidelity Speech

...In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168× faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13× faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
11

phasevocoder

phase vocoder for time scaling and pitch transposition etc.

phasevocoder: phase vocoder for time scaling and pitch transposition etc. Copyright (c) 2008-2020 by Klaus Michael Indlekofer. All rights reserved. Note: Special restrictions apply. See disclaimers below and within the distribution. (We are not affiliated in any way with companies/persons mentioned on this page. All brand names and trademarks are property of their respective owners.)

Downloads: 0 This Week

Last Update: 2020-06-04
See Project
12

Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation

...It includes directory layouts and logging directories for multiple datasets such as LJSpeech and M-AILABS en_US/en_UK, making it easier to adapt to new English corpora. Separate log trees track mel-spectrograms, attention plots, evaluation audio, and vocoder outputs, so you can inspect how alignment and audio quality evolve over time.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
13

WaoN

WaoN is a Wave-to-Notes transcriber (converts audio file into midi file) and some utility tools such as gWaoN, graphical visualization of the spectra, and phase vocoder for time-stretching and pitch-shifting.

1 Review

Downloads: 8 This Week

Last Update: 2018-10-06
See Project
14

dvoc

A configurable, real-time vocoder using ALSA.

Downloads: 0 This Week

Last Update: 2013-03-25
See Project
15

Vintage Vocoder

Vintage Vocoder real-time audio effect - VST and DXI plug-in for PC/MAC. Originally a commercial product published by Sonicism Digital Audio Solutions in 2002. This software was used for the robot voices and sound effects in the computer game Freelancer.

1 Review

Downloads: 1 This Week

Last Update: 2013-03-08
See Project
16

Sculptor: the Sound-warping tool

Sculptor is a phase-vocoder-based package with real-time capabilites. You can use it to fiddle with soundfiles in the frequency domain, changing pitch and duration independently.

Downloads: 0 This Week

Last Update: 2013-02-19
See Project