Showing 96 open source projects for "waveform"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Text to Waveform

    Text to Waveform

    Create synth presets from words

    Convert words to waveforms you can load into a synthesizer oscillator to create synth presets. Have fun turning your name, your friends' names, your city name, your pet's name, your team's name into synth presets you can use to produce a track.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    garysfm

    garysfm

    An advanced file manager with qss themes and iso and folder previews

    garysfm which stands for Gary's File Manager is a file manager with some advanced features. Those features include bulk renaming and folder image previews. I has rather advanced search functions, tab browsing with persistence between launches. It remembers your folder sorting and view options in icon view. It also remembers your active tabs between sessions. It has progress dialog while doing large operations like copying large files, and folders with many files. python version works on...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    LoopAuditioneer

    LoopAuditioneer

    Software for loop and cue handling in .wav files.

    Since 2024-02-28 development of LoopAuditioneer has moved to GitHub. Please go to https://github.com/GrandOrgue/LoopAuditioneer for recent updates and releases.
    Leader badge
    Downloads: 61 This Week
    Last Update:
    See Project
  • 4
    Parallel WaveGAN

    Parallel WaveGAN

    Unofficial Parallel WaveGAN

    Parallel WaveGAN is an unofficial PyTorch implementation of several state-of-the-art non-autoregressive neural vocoders, centered on Parallel WaveGAN but also including MelGAN, Multiband-MelGAN, HiFi-GAN, and StyleMelGAN. Its main goal is to provide a real-time neural vocoder that can turn mel spectrograms into high-quality speech audio efficiently. The repository is designed to work hand-in-hand with ESPnet-TTS and NVIDIA Tacotron2-style front ends, so you can build complete TTS or singing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    Quick Subtitles

    Quick Subtitles

    HTML5 Based Subtitle Creation Tool

    Quick Subtitles in an HTML5 based solution for rapid creation and syncing of subtitles while playing your video. It is designed around the concept that you should minimize the need to take your hands off the keyboard while performing data entry.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Demucs

    Demucs

    Code for the paper Hybrid Spectrogram and Waveform Source Separation

    Demucs (Deep Extractor for Music Sources) is a deep-learning framework for music source separation—extracting individual instrument or vocal tracks from a mixed audio file. The system is based on a U-Net-like convolutional architecture combined with recurrent and transformer elements to capture both short-term and long-term temporal structure. It processes raw waveforms directly rather than spectrograms, allowing for higher-quality reconstruction and fewer artifacts in separated tracks. The...
    Downloads: 96 This Week
    Last Update:
    See Project
  • 7
    VALL-E

    VALL-E

    PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

    We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    audio-diffusion-pytorch

    audio-diffusion-pytorch

    Audio generation using diffusion models, in PyTorch

    A fully featured audio diffusion library, for PyTorch. Includes models for unconditional audio generation, text-conditional audio generation, diffusion autoencoding, upsampling, and vocoding. The provided models are waveform-based, however, the U-Net (built using a-unet), DiffusionModel, diffusion method, and diffusion samplers are both generic to any dimension and highly customizable to work on other formats. Note: no pre-trained models are provided here, this library is meant for research purposes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Security Log Generator

    Security Log Generator

    Generates logs of typical formats that would often be found in a SOC

    Generates logs of typical formats that would often be found in a SOC. As of 31st January 2023, it supports IDS, Web Access and Endpoint log formats. Can generate a specific number of events in a linear fashion or use a waveform to add 'bumpiness' to your data. The code is modular and extensible, adding additional formats can be done with relative ease.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    WaveformSeekBar

    WaveformSeekBar

    Android Waveform SeekBar library

    Android Waveform SeekBar library.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    EnCodec

    EnCodec

    State-of-the-art deep learning based audio codec

    ...Unlike traditional codecs (like MP3 or Opus), Encodec uses a learned quantizer and decoder to reconstruct complex waveforms with remarkable accuracy at bitrates as low as 1.5 kbps. It employs a convolutional encoder–decoder architecture trained with perceptual loss functions that optimize for human auditory quality rather than raw waveform distance. The model can operate in real time and supports variable bandwidths, bitrates, and multi-band audio. Encodec has applications in speech and music compression, generative modeling, and efficient data transmission for communication systems. The repository includes pretrained checkpoints, PyTorch inference code, and examples for integrating Encodec as a module in downstream generative or streaming systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    DiffSinger

    DiffSinger

    Singing Voice Synthesis via Shallow Diffusion Mechanism

    DiffSinger is an open-source PyTorch implementation of a diffusion-based acoustic model for singing-voice synthesis (SVS) and also text-to-speech (TTS) in a related variant. The core idea is to view generation of a sung voice (mel-spectrogram) as a diffusion process: starting from noise, the model iteratively “denoises” while being conditioned on a music score (lyrics, pitch, musical timing). This avoids some of the typical problems of prior SVS models — like over-smoothing or unstable GAN...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 13
    WaveRNN

    WaveRNN

    WaveRNN Vocoder + TTS

    WaveRNN is a PyTorch implementation of DeepMind’s WaveRNN vocoder, bundled with a Tacotron-style TTS front end to form a complete text-to-speech stack. As a vocoder, WaveRNN models raw audio with a compact recurrent neural network that can generate high-quality waveforms more efficiently than many traditional autoregressive models. The repository includes scripts and code for preprocessing datasets such as LJSpeech, training Tacotron to produce mel spectrograms, training WaveRNN on those...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    VoiceFixer

    VoiceFixer

    General Speech Restoration

    ...The architecture works in two stages: first an analysis stage that tries to extract “clean” intermediate features from the noisy audio (e.g. removing noise, denoising, dereverberation, upsampling), and then a neural vocoder-based synthesis stage that reconstructs a high-quality waveform from those features. Unlike many single-purpose noise reduction tools, VoiceFixer targets a “general speech restoration” problem (GSR), capable of handling multiple types of distortions at once, which makes it suitable for old recordings, phone-call audio, amateur voice recordings, or archival media. Evaluations show that VoiceFixer significantly improves both objective and subjective audio quality compared to baseline speech-enhancement methods.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    TensorFlowTTS

    TensorFlowTTS

    Real-Time State-of-the-art Speech Synthesis for Tensorflow 2

    TensorFlowTTS is a state-of-the-art, open-source speech synthesis library built on TensorFlow 2. It offers a variety of architectures for text-to-speech, including classic and modern models such as Tacotron‑2, FastSpeech / FastSpeech2, and neural vocoders like MelGAN and Multiband‑MelGAN. Because it’s based on TensorFlow 2, it can leverage optimizations such as fake-quantization aware training and pruning — which allow models to run faster than real time and to be deployable on mobile or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    VITS

    VITS

    Conditional Variational Autoencoder with Adversarial Learning

    VITS is a foundational research implementation of “VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech,” a well-known neural TTS architecture. Unlike traditional two-stage systems that separately train an acoustic model and a vocoder, VITS trains an end-to-end model that maps text directly to waveform using a conditional variational autoencoder combined with normalizing flows and adversarial training. This architecture enables parallel generation (fast inference) while achieving speech quality that rivals or surpasses many two-stage systems. The repository provides training and inference pipelines for common datasets such as LJ Speech (single-speaker) and VCTK (multi-speaker), including filelists, configs, and preprocessing scripts. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Denoiser

    Denoiser

    Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)

    ...It uses a causal encoder-decoder architecture with skip connections, optimized with losses defined both in the time domain and frequency domain to better suppress noise while preserving speech. Unlike models that operate on spectrograms alone, this design enables lower latency and coherent waveform output. The implementation includes data augmentation techniques applied to the raw waveforms (e.g. noise mixing, reverberation) to improve model robustness and generalization to diverse noise types. The project supports both offline denoising (batch inference) and live audio processing (e.g. via loopback audio interfaces), making it practical for real-time use in calls or recording. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18

    pyrpl

    PyRPL turns your Red Pitaya into a powerful analog feedback device.

    The Red Pitaya is a commercial, affordable FPGA board with fast analog inputs and outputs. This makes it useful for quantum optics experiments, in particular as a digital feedback controller for analog systems. Based on the open source software provided by the board manufacturer, PyRPL (Python RedPitaya Lockbox) implements many devices that are needed for optics experiments with the Red Pitaya. PyRPL implements various digital signal processing (DSP) modules (see features below). It allows...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 19
    HiFi-GAN

    HiFi-GAN

    Generative Adversarial Networks for Efficient and High Fidelity Speech

    ...It introduces a generator architecture tailored to model the periodic structure of speech and a set of discriminators that focus on different scales and periods of the waveform to better capture naturalness. The model targets a sweet spot between sample quality and generation speed, outperforming many previous GAN vocoders while being far faster than typical autoregressive models. In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168× faster than real time on an NVIDIA V100 GPU. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20

    Timewave Synthesizer

    A binary waveform synthesizer based on Timewave Zero

    A binary waveform synthesizer derived from and based upon Timewave Zero theory by Terence McKenna. His theory proposes that the structure of time is a complex wave with a scalar potential, and the King Wen sequence of the I Ching is the 64 hexagram code from which the waveform of time is derived; that the King Wen sequence maps the linear progression of the human states of mind, which ultimately substantiates this waveform.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    WaveDrom

    WaveDrom

    Digital timing diagram rendering engine

    ...WaveDrom draws your Timing Diagram or Waveform from a simple textual description. It comes with description language, a rendering engine and an editor. WaveDrom editor works in the browser or can be installed on your system.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    GLava

    GLava

    GLava - OpenGL audio spectrum visualizer

    GLava is a real-time OpenGL audio spectrum visualizer for Linux desktops that renders directly onto your desktop background. Designed for users who enjoy live audio visualizations integrated with their environment, it supports shader-based effects and customization for creating stunning reactive visuals. GLava works with PulseAudio and ALSA and can be embedded seamlessly with tiling window managers or traditional desktop setups. Its configuration is flexible, allowing the community to design...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 23
    Tacotron-2

    Tacotron-2

    DeepMind's Tacotron-2 Tensorflow implementation

    ...The repository is structured as a full training pipeline: dataset preparation, preprocessing into spectrograms, Tacotron training, WaveNet (or Griffin-Lim) vocoder training, and final waveform synthesis. It includes directory layouts and logging directories for multiple datasets such as LJSpeech and M-AILABS en_US/en_UK, making it easier to adapt to new English corpora. Separate log trees track mel-spectrograms, attention plots, evaluation audio, and vocoder outputs, so you can inspect how alignment and audio quality evolve over time.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    DC-TTS

    DC-TTS

    TensorFlow Implementation of DC-TTS: yet another text-to-speech model

    ...The model is split into two networks: Text2Mel, which maps text to mel-spectrograms, and SSRN (spectrogram super-resolution network), which converts low-resolution mel-spectrograms into high-resolution magnitude spectrograms suitable for waveform synthesis. Training scripts, data loaders, and hyperparameter configurations are provided to reproduce results on several datasets, including LJ Speech for English, a Korean single-speaker dataset, and audiobook data from Nick Offerman and Kate Winslet.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    SGA Conversion and Analysis Tool

    Convert SGA format multimedia files to PNG and AVI format

    This program is a utility for viewing, analyzing, and converting the data in SGA format multimedia files used by Digital Pictures in their games for the Sega Mega CD, Sega Super 32X, Sega Saturn, 3DO video game systems, and Windows PCs into formats that can be read by media player software on modern PCs.
    Downloads: 2 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB