Showing 33 open source projects for "model-builder"

View related business solutions
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    OpenVINO AI Plugins for Audacity

    OpenVINO AI Plugins for Audacity

    A set of AI-enabled effects, generators, and analyzers for Audacity

    A set of AI-enabled effects, generators, and analyzers for Audacity. These AI features run 100% locally on your PC, no internet connection is necessary. OpenVINO™ is used to run AI models on supported accelerators found on the user's system such as CPU, GPU, and NPU.
    Downloads: 142 This Week
    Last Update:
    See Project
  • 2
    NovaSR

    NovaSR

    A lightning fast audio upsampler

    NovaSR is an extremely lightweight and high-performance audio upsampling model that transforms low-quality 16 kHz audio into clearer, high-fidelity 48 kHz audio with remarkable speed and efficiency. At only about 50 KB in size, the model is orders of magnitude smaller than typical audio super-resolution networks, yet it achieves high quality and realtime performance thanks to its compact architecture and efficient convolutional design.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    PersonaPlex is an open-source real-time conversational speech AI model that goes beyond traditional text chat by providing full-duplex speech-to-speech interaction, meaning it can listen and talk at the same time instead of waiting for you to finish speaking before responding. This architectural approach eliminates awkward pauses and makes conversations feel much more human-like, with natural behaviors such as overlapping speech, interruptions, and fluent turn-taking, traits that traditional AI assistants typically lack. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Moonshine Voice

    Moonshine Voice

    Fast and accurate automatic speech recognition (ASR) for edge devices

    ...Moonshine supports multiple platforms including mobile, desktop, and embedded systems, and provides example projects to accelerate integration into real-world products. The toolkit also includes specialized model variants, including monolingual options that improve accuracy for specific languages. Overall, moonshine serves developers building privacy-conscious, on-device voice interfaces that demand high performance with minimal resource overhead.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 5
    Moshi

    Moshi

    A speech-text foundation model for real time dialogue

    Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size), yet performs better than existing, non-streaming, codecs like SpeechTokenizer (50 Hz, 4kbps), or SemantiCodec (50 Hz, 1.3kbps).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    lbat

    lmms projects builder and tagger

    Wrapper atop "lmms render" - keep rendered tracks up to date with lmms projects. Creates EBU R128 compliat tracks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    VCClient

    VCClient

    Software that uses AI to perform real-time voice conversion

    VCClient is a real-time voice conversion system that uses machine learning models to transform a speaker’s voice into another voice with minimal latency. It is designed for live applications such as streaming, gaming, and virtual communication, where immediate feedback is essential. The system supports multiple voice conversion models, including RVC and other neural network-based approaches, allowing users to switch between different voices or customize their output. It provides both a...
    Downloads: 35 This Week
    Last Update:
    See Project
  • 8
    CommBusTerm

    CommBusTerm

    "CommBusTerm"

    "CommBusTerm" is based on a "Raspberry Pi 4 Model B" or "Raspberry Pi 5", and the "kmicommbusmediator" interface, which provides connectivity to the "PPG Communication Bus" which is used by PPG synthesizers such as the "Wave 2.2", "Wave 2.3", "EVU". Hardware: * "kmicommbusmediator": https://sourceforge.net/projects/kmicommbusmediator/ * "kmiwaveram": https://sourceforge.net/projects/kmi-wave-ram/ Software: * "cbmedimonitor" control tool * "WaveProgEdit" sound-program editor * "WaveBackup" backups * "waveaddsynth" for additive synthesis * "wavetransientutil" for transient sounds Copyright (C) 2022-2026 by Klaus Michael Indlekofer. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    kmicommbusmediator

    kmicommbusmediator

    "KMI Comm. Bus Mediator"

    "KMI Comm. Bus Mediator" ("kmicommbusmediator") The project "kmicommbusmediator" provides connectivity for the "Raspberry Pi 4 Model B" and "Raspberry Pi 5" to the "PPG Communication Bus" which is used by PPG devices such as the "Wave 2.2", "Wave 2.3", "EVU", "PRK", "Waveterm A", "Waveterm B". "kmicommbusmediator" contains hardware and software components: * "kmicommbusmediator-HW": "Comm. Bus" hardware-interface * "CBMEDIBIOS": firmware for "kmicommbusmediator-HW" * "cbmedimonitor": control tool for "Raspberry Pi" Utilities: "wtafloppylist", "wtbfloppylist", ... ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10

    Ultimate Media Downloader

    An Open source media downloader for downloading videos and audios

    ...Whether you're downloading a single YouTube video, extracting audio from Spotify playlists, archiving TikTok content, or batch-processing entire music libraries, UMD handles it all with elegance and efficiency. IT CONSISTS OF : 1. Unified Interface: One command, 1000+ platforms. No tool shopping, no mental model switching. 2. Production-Ready, Zero Friction Installation: Most users go from hearing about the tool to downloading content in under 5 minutes. 3. Active Maintenance: Codeberg hosting (after GitHub suspension) demonstrates commitment to long-term availability
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    NAM-Runner

    Batch file to install and run NAM (neural-amp-modeler) easily.

    A Windows 10 batch file, that installs and runs the NAM model trainer (neural-amp-modeler) by Steven Atkinson right into the GUI application. Fully automated. Custom one-time installation of everything you need to train neural network models of guitar amps and more for the NAM VST plugin, no Conda required. Runs as a launcher afterwards. Portable installation. New pyTorch inclues CUDA runtime for fast Nvidia GPU support.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    EnCodec

    EnCodec

    State-of-the-art deep learning based audio codec

    ...It employs a convolutional encoder–decoder architecture trained with perceptual loss functions that optimize for human auditory quality rather than raw waveform distance. The model can operate in real time and supports variable bandwidths, bitrates, and multi-band audio. Encodec has applications in speech and music compression, generative modeling, and efficient data transmission for communication systems. The repository includes pretrained checkpoints, PyTorch inference code, and examples for integrating Encodec as a module in downstream generative or streaming systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Coqui STT

    Coqui STT

    The deep learning toolkit for speech-to-text

    Coqui STT is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models. Coqui STT is battle-tested in both production and research. Multiple possible transcripts, each with an associated confidence score. Experience the immediacy of script-to-performance. With Coqui text-to-speech, production times go from months to minutes. With Coqui, the post is a pleasure. Effortlessly clone the voices of your talent and have the clone handle the problems...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    MIDI Simplified 1.6

    MIDI Simplified 1.6

    MIDI Devices, Files and Sequencing Components for Delphi 10.x VCL/FMX

    Control MIDI devices with 32/64bit VCL/FMX components for Delphi 10 supporting native Windows and macOS. Support for iOS (Beta) and Android (output only). Packages in beta for C++ Builder and Lazarus for easy build and install. MIDI (Musical Instrument Digital Interface) is a technical standard that describes a communications protocol, digital interface, and electrical connectors that connect a wide variety of electronic musical instruments, computers, and related audio devices for playing, editing and recording music (Wikipedia). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    VAD

    VAD

    Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM

    This repository is a voice activity detection (VAD) toolkit that implements multiple models (DNN, bDNN, LSTM, ACAM) for detecting speech versus non-speech in audio. It also provides a recorded dataset in varied real-world settings (e.g. bus stop, construction site, park, room) with ground truth labeling. Acoustic feature extraction (multi-resolution cochleagram, MRCG). Post-processing modules (e.g. smoothing, thresholds). The toolkit supports both MATLAB and Python/TensorFlow components (for...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    DeepSpeech

    DeepSpeech

    Open source embedded speech-to-text engine

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers. DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. A pre-trained English model is available for use and can be downloaded following the instructions in the usage docs. If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the DeepSpeech releases page.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    XZVoice

    XZVoice

    Free and open source text-to-speech software

    ...Technically, multi-level rhythmic pauses are taken into account to achieve the purpose of natural synthesizing rhythm, and comprehensively use acoustic parameters and linguistic parameters to establish multiple automatic prediction models based on deep learning. Using massive audio data to train the pronunciation model, the synthetic sound is real, full, cadenced, and expressive, and the MOS score has reached the professional level in the industry.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    TTS

    TTS

    Deep learning for text to speech

    ...TTS comes with pre-trained models, tools for measuring dataset quality, and is already used in 20+ languages for products and research projects. Released models in PyTorch, Tensorflow and TFLite. Tools to curate Text2Speech datasets underdataset_analysis. Demo server for model testing. Notebooks for extensive model benchmarking. Modular (but not too much) code base enabling easy testing for new ideas. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings efficiently. Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN). ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Carnatic Music Guru / JRaaga

    Carnatic Music Guru / JRaaga

    Carnatic Music Guru - JRaaga

    VISIT THIS PAGE AS MORE FEATURES ARE BEING ADDED. If you have downloaded 2.03 or above - Use Help>Check Updates to download latest version. If that does not update: Download CMGUpdater from here: update/CMGUpdater.jar Copy it to the update/ folder of your JRaaga installation path. Try again Help>Check for updates. If it does not work. Delete existing installation. Download the latest version and try. Carnatic Music Guru is a tutor/player/lesson generator. YOU NEED Java Runtime...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    SoundJS

    SoundJS

    Javascript library for working with Audio

    SoundJS is a library to make working with audio on the web easier. It provides a consistent API for playing audio in different browsers, including using a target plugin model to provide an easy way to provide additional audio plugins like a Flash fallback (included, but must be used separately from the combined/minified version). A mechanism has been provided for easily tying in audio preloading to PreloadJS. The core API for playing sounds. Call createjs.Sound.play(sound, ...options), and a sound instance is created that can be used to control the audio, and dispatches events when it is complete, loops, or is interrupted. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    RuneAudio

    RuneAudio

    Free and open source Hi-Fi music player for embedded hardware

    RuneAudio is a free and open source software that turns embedded hardware into Hi-Fi music players. We want to make a cheap, low-consumption and silent mini-PC perform as an high fidelity digital source. RuneAudio features a custom-built Linux distribution (RuneOS) and a web player (RuneUI) which allows to remote control playback and setup options, from multiple devices (desktop PC, netbook, tablet, smartphone).
    Downloads: 45 This Week
    Last Update:
    See Project
  • 22
    FlashWavRecorder

    FlashWavRecorder

    Simple flash file for recording audio and saving as a WAV

    FlashWavRecorder is a Flash-based tool that enables recording audio from a user's microphone and exporting it as a WAV file in real time. It uses Flash Player to access audio input and communicate with JavaScript to control recording sessions from the browser. This was especially useful before native browser audio APIs became widely supported and remains relevant in legacy systems requiring Flash-based audio capture.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    HMM Speech Recognition in Matlab

    A speech recognition system using Matlab/Simulink/Stateflow.

    This project provide hidden Markov model speech recognition system by using Matlab/Simulink/Stateflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    piPlayer

    piPlayer

    piPlayer (Personal Interactive Player) is an OSGi multimedia player

    ...It is an OSGi-based application that plays personalised local and remote multimedia contents. piPlayer introduces a solution for an audiovisual convergent service through service gateway. A OSGi-compliant multimedia player has been developed taking as work model the open source philosophy and adding and interactive and personalized value through a service gateway.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Audio Analysis/Resynthesis the way Darwin would have done it if he were only into computer music. Using a genetic algorithm to evolve a sinusoidal/noise based sound model, create variations as the audio chromosome of a sound's family tree progresses.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo