Showing 17 open source projects for "speech decoder"

View related business solutions
  • 99.99% Uptime for MySQL and PostgreSQL on Google Cloud Icon
    99.99% Uptime for MySQL and PostgreSQL on Google Cloud

    Enterprise Plus edition delivers sub-second maintenance downtime and 2x read/write performance. Built for critical apps.

    Cloud SQL Enterprise Plus gives you a 99.99% availability SLA with near-zero downtime maintenance—typically under 10 seconds. Get 2x better read/write performance, intelligent data caching, and 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server with built-in vector search for gen AI apps. New customers get $300 in free credit.
    Try Cloud SQL Free
  • $300 in Free Credit for Your Google Cloud Projects Icon
    $300 in Free Credit for Your Google Cloud Projects

    Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

    Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    ...These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
    Downloads: 59 This Week
    Last Update:
    See Project
  • 2
    FireRedASR

    FireRedASR

    Open-source industrial-grade ASR models

    FireRedASR is an industrial-grade family of open-source automatic speech recognition models designed to provide high-precision speech-to-text performance across languages including Mandarin, English, and various Chinese dialects, achieving new state-of-the-art benchmarks on public test sets. The project includes multiple model variants to meet different application needs, such as high-accuracy end-to-end interaction using an encoder-adapter-LLM framework and efficient real-time recognition using attention-based encoder-decoder architectures, giving developers flexibility in balancing performance and resource constraints. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build AI Apps with Gemini 3 on Vertex AI Icon
    Build AI Apps with Gemini 3 on Vertex AI

    Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

    Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.
    Try Vertex AI Free
  • 5
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    TorchAudio

    TorchAudio

    Data manipulation and transformation for audio signal processing

    The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library. The benefits of PyTorch can be seen in torchaudio through having all the computations be through PyTorch...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    opencore-amr

    Audio codecs extracted from Android Open Source Project

    Library of OpenCORE Framework implementation of Adaptive Multi Rate Narrowband and Wideband (AMR-NB and AMR-WB) speech codec. Library of VisualOn implementation of Adaptive Multi Rate Wideband (AMR-WB) encoder and Advanced Audio Coding (AAC) encoder. Modified library of Fraunhofer AAC decoder and encoder.
    Leader badge
    Downloads: 6,073 This Week
    Last Update:
    See Project
  • 8
    Lyra

    Lyra

    A Very Low-Bitrate Codec for Speech Compression

    lyra is a neural audio codec designed to deliver intelligible, natural-sounding speech at extremely low bitrates, making real-time communication viable on constrained networks. It replaces hand-engineered codecs with learned models that capture speech characteristics more efficiently and reconstruct waveforms with a neural vocoder. The system targets mobile-class hardware, balancing latency and quality so it can run in real-time on phones.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    EnCodec

    EnCodec

    State-of-the-art deep learning based audio codec

    Encodec is a neural audio codec developed by Meta for high-fidelity, low-bitrate audio compression using end-to-end deep learning. Unlike traditional codecs (like MP3 or Opus), Encodec uses a learned quantizer and decoder to reconstruct complex waveforms with remarkable accuracy at bitrates as low as 1.5 kbps. It employs a convolutional encoder–decoder architecture trained with perceptual loss functions that optimize for human auditory quality rather than raw waveform distance. The model can operate in real time and supports variable bandwidths, bitrates, and multi-band audio. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    DiffSinger

    DiffSinger

    Singing Voice Synthesis via Shallow Diffusion Mechanism

    ...The method introduces a “shallow diffusion” mechanism: instead of diffusing over many steps, generation begins at a shallow step determined adaptively, which leverages prior knowledge learned by a simple mel-spectrogram decoder and speeds up inference.
    Downloads: 52 This Week
    Last Update:
    See Project
  • 11
    Denoiser

    Denoiser

    Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)

    Denoiser is a real-time speech enhancement model operating directly on raw waveforms, designed to clean noisy audio while running efficiently on CPU. It uses a causal encoder-decoder architecture with skip connections, optimized with losses defined both in the time domain and frequency domain to better suppress noise while preserving speech. Unlike models that operate on spectrograms alone, this design enables lower latency and coherent waveform output.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    wav2letter++

    wav2letter++

    Facebook AI research's automatic speech recognition toolkit

    First, install Flashlight (using the 0.3 branch is required) with the ASR application. This repository includes recipes to reproduce the following research papers as well as pre-trained models. All results reproduction must use Flashlight <= 0.3.2 for exact reproducibility. At least one of LZMA, BZip2, or Z is required for LM compression with KenLM. It is highly recommended to build KenLM with position-independent code (-fPIC) enabled, to enable python compatibility. After installing, run...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    OpenSeq2Seq

    OpenSeq2Seq

    Toolkit for efficient experimentation with Speech Recognition

    OpenSeq2Seq is a TensorFlow-based toolkit for efficient experimentation with sequence-to-sequence models across speech and NLP tasks. Its core goal is to give researchers a flexible, modular framework for building and training encoder–decoder architectures while fully leveraging distributed and mixed-precision training. The toolkit includes ready-made models for neural machine translation, automatic speech recognition, speech synthesis, language modeling, and additional NLP tasks such as sentiment analysis. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    JuliusModels

    JuliusModels

    Open source speech models for Julius in English and other languages.

    Open source speech models for Julius speech decoder. Its aim is to give access a wider community of speech recognition enthusiasts to quality models, which they can use in their own projects on different OS platforms (Unix, Windows, etc...) All of the models are based on HTK modelling software and data sets available freely on the Internet.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15

    Distant Speech Recognition

    Beamforming and Speech Recognition Toolkit

    BTK contains C++ and Python libraries that implement speech processing and microphone array techniques such as speech feature extraction, speech enhancement, speaker tracking, beamforming, dereverberation and echo cancellation algorithms. The Millennium ASR provides C++ and python libraries for automatic speech recognition. The Millennium ASR implements a weighted finite state transducer (WFST) decoder, training and adaptation methods.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Digital Speech Decoder and xMBE codec library - To decode various xMBE based, QPSK, C4FM modes such as P25 Phase 2 (TDMA), MotoTRBO, ProVoice, NexEDGE and others..
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    JSpeex is a Java port of the Speex speech codec (Open Source/Free Software patent-free audio compression format designed for speech). It provides both the decoder and the encoder in pure Java, as well as a JavaSound SPI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB