Showing 16 open source projects for "real-time vocoder"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    Real-Time Voice Cloning

    Real-Time Voice Cloning

    Clone a voice in 5 seconds to generate arbitrary speech in real-time

    Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 2
    WhisperSpeech

    WhisperSpeech

    An Open Source text-to-speech system built by inverting Whisper

    WhisperSpeech is an open-source text-to-speech system created by “inverting” OpenAI’s Whisper, reusing its strengths as a semantic audio model to generate speech instead of only transcribing it. The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data. Its architecture follows a token-based, multi-stage pipeline inspired by AudioLM and SPEAR-TTS:...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Nyquist

    Nyquist

    Nyquist is a language for sound synthesis and music composition.

    Nyquist is a language for sound synthesis and music composition. It is implemented in C and C++ and runs on Win32, OSX, and Linux. Nyquist combines a powerful functional programming style with efficient signal-processing primitives. Nyquist is also embedded as a scripting language in Audacity.
    Leader badge
    Downloads: 55 This Week
    Last Update:
    See Project
  • 4
    Parallel WaveGAN

    Parallel WaveGAN

    Unofficial Parallel WaveGAN

    Parallel WaveGAN is an unofficial PyTorch implementation of several state-of-the-art non-autoregressive neural vocoders, centered on Parallel WaveGAN but also including MelGAN, Multiband-MelGAN, HiFi-GAN, and StyleMelGAN. Its main goal is to provide a real-time neural vocoder that can turn mel spectrograms into high-quality speech audio efficiently. The repository is designed to work hand-in-hand with ESPnet-TTS and NVIDIA Tacotron2-style front ends, so you can build complete TTS or singing voice synthesis pipelines. It includes a large collection of “Kaldi-style” recipes for many datasets such as LJSpeech, LibriTTS, VCTK, JSUT, CMU Arctic, and multiple singing voice corpora in Japanese, Mandarin, Korean, and more. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    Lyra

    Lyra

    A Very Low-Bitrate Codec for Speech Compression

    lyra is a neural audio codec designed to deliver intelligible, natural-sounding speech at extremely low bitrates, making real-time communication viable on constrained networks. It replaces hand-engineered codecs with learned models that capture speech characteristics more efficiently and reconstruct waveforms with a neural vocoder. The system targets mobile-class hardware, balancing latency and quality so it can run in real-time on phones. Its architecture is resilient to packet loss and jitter through framing strategies and error concealment, helping conversations remain understandable under adverse conditions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    radio_vocoder_FFT

    radio_vocoder_FFT

    a vocoder + equalizer + FFT effects version of radio_chung

    radio vocoder chung is a vocoder + linear equalizer(s) + FFT effect(s) version of radio chung free internet web radio stream url and audio file generic path player ( * ,mp3,*name*.ogg,wav,...) with dsp(s) (baxandall , resonance , automod , decay , flat , noisered , speed , feedback ) using bass.dll , gui_chung , FFTdll.dll fft fast fourier transform and freebasic .high quality small pitch shift shifting for radio url . added record, playrec, save as MP3 , feedback , anticlick .
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Mocking Bird

    Mocking Bird

    Clone a voice in 5 seconds to generate arbitrary speech in real-time

    MockingBird is an open-source voice cloning and real-time speech generation toolkit that lets you clone a speaker’s voice from a short audio sample (reportedly as little as 5 seconds) and then synthesize arbitrary speech in that voice. It builds on deep-learning based TTS / voice-cloning technology (in the lineage of projects such as Real-Time-Voice-Cloning), but extends it with support for Mandarin Chinese and multiple Chinese speech datasets — broadening its applicability beyond English. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    VoiceFixer

    VoiceFixer

    General Speech Restoration

    VoiceFixer is a machine-learning framework for “speech restoration”: given a degraded or distorted audio recording — with noise, clipping, low sampling rate, reverberation, or other artifacts — it attempts to recover high-fidelity, clean speech. The architecture works in two stages: first an analysis stage that tries to extract “clean” intermediate features from the noisy audio (e.g. removing noise, denoising, dereverberation, upsampling), and then a neural vocoder-based synthesis stage that...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    TensorFlowTTS

    TensorFlowTTS

    Real-Time State-of-the-art Speech Synthesis for Tensorflow 2

    ...It offers a variety of architectures for text-to-speech, including classic and modern models such as Tacotron‑2, FastSpeech / FastSpeech2, and neural vocoders like MelGAN and Multiband‑MelGAN. Because it’s based on TensorFlow 2, it can leverage optimizations such as fake-quantization aware training and pruning — which allow models to run faster than real time and to be deployable on mobile or embedded platforms. The library supports multiple languages (English, French, Korean, Chinese, German, etc.) and is relatively easy to adapt to new languages. With integrated vocoder + mel-spectrogram generation pipelines, pre-trained models, and fairly flexible architecture, TensorFlowTTS is a great off-the-shelf and extensible TTS engine for applications ranging from voice assistants to content generation or accessibility tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 10
    HiFi-GAN

    HiFi-GAN

    Generative Adversarial Networks for Efficient and High Fidelity Speech

    ...In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168× faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13× faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    phasevocoder

    phasevocoder

    phase vocoder for time scaling and pitch transposition etc.

    phasevocoder: phase vocoder for time scaling and pitch transposition etc. Copyright (c) 2008-2020 by Klaus Michael Indlekofer. All rights reserved. Note: Special restrictions apply. See disclaimers below and within the distribution. (We are not affiliated in any way with companies/persons mentioned on this page. All brand names and trademarks are property of their respective owners.)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Tacotron-2

    Tacotron-2

    DeepMind's Tacotron-2 Tensorflow implementation

    ...It includes directory layouts and logging directories for multiple datasets such as LJSpeech and M-AILABS en_US/en_UK, making it easier to adapt to new English corpora. Separate log trees track mel-spectrograms, attention plots, evaluation audio, and vocoder outputs, so you can inspect how alignment and audio quality evolve over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    WaoN is a Wave-to-Notes transcriber (converts audio file into midi file) and some utility tools such as gWaoN, graphical visualization of the spectra, and phase vocoder for time-stretching and pitch-shifting.
    Leader badge
    Downloads: 11 This Week
    Last Update:
    See Project
  • 14
    A configurable, real-time vocoder using ALSA.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Vintage Vocoder real-time audio effect - VST and DXI plug-in for PC/MAC. Originally a commercial product published by Sonicism Digital Audio Solutions in 2002. This software was used for the robot voices and sound effects in the computer game Freelancer.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    Sculptor is a phase-vocoder-based package with real-time capabilites. You can use it to fiddle with soundfiles in the frequency domain, changing pitch and duration independently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB