Search Results for "audio processing" - Page 4

Showing 326 open source projects for "audio processing"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    ...It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research preview, and includes data-processing scripts so users can train or finetune their own variants. Inference is provided through a Python package that uses vLLM under the hood for high-throughput, low-latency generation, including streaming examples that show how to generate audio chunks in real time. The maintainers provide Colab notebooks, a standardized prompting format, and one-click deployment via Baseten for production-grade, FP8/FP16 optimized inference with ~200 ms streaming latency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Spring AI Alibaba Examples

    Spring AI Alibaba Examples

    Spring AI Alibaba examples for building and testing AI apps

    ...It is designed to help developers understand core concepts, explore practical implementations, and follow best practices when building AI-powered systems using the Spring ecosystem. Each module focuses on a specific use case such as chat, image processing, audio handling, graph workflows, and retrieval-augmented generation. The examples highlight how to integrate AI models, manage prompts, handle memory, and build multi-model or multi-agent workflows. Developers can explore individual project folders for detailed instructions and implementation guidance. Spring AI Alibaba Examples also supports experimentation through playground modules and encourages contributions to expand real-world AI use cases and improve development practices.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    MediaPipe Solutions

    MediaPipe Solutions

    Cross-platform, customizable ML solutions

    MediaPipe is an open-source framework developed by Google for building cross-platform machine learning pipelines that process audio, video, and other streaming data in real time. The system provides developers with tools and reusable components that allow them to combine multiple machine learning models with preprocessing and postprocessing logic into efficient perception pipelines. These pipelines can run on a wide variety of platforms including mobile devices, desktop systems, web...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Dolphin

    Dolphin

    Document Image Parsing via Heterogeneous Anchor Prompting”

    ...It is designed to integrate with other tools and libraries and provide stable playback or media-processing pipelines, while remaining open-source so that users can inspect, extend, and adapt it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    VCClient

    VCClient

    Software that uses AI to perform real-time voice conversion

    VCClient is a real-time voice conversion system that uses machine learning models to transform a speaker’s voice into another voice with minimal latency. It is designed for live applications such as streaming, gaming, and virtual communication, where immediate feedback is essential. The system supports multiple voice conversion models, including RVC and other neural network-based approaches, allowing users to switch between different voices or customize their output. It provides both a...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 6
    YoutubeExplode

    YoutubeExplode

    Abstraction layer over YouTube's internal API

    ...Under the hood, the library parses raw page data and leverages reverse-engineered internal endpoints to obtain structured information and stream manifests. Developers can use it to access details such as titles, authors, durations, captions, and available media formats, as well as to download audio or video streams for further processing. The library is designed to be intuitive and cross-platform through .NET Standard compatibility, making it suitable for desktop tools, automation pipelines, and media utilities.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    ...It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio and video latency while processing frames and generating responses. Developers work with an agent abstraction that connects video edge providers, LLMs, and processors into pipelines, making it easier to orchestrate tasks like object detection, pose estimation, and conversational guidance. The project includes SDKs for React, Android, iOS, Flutter, React Native, and Unity, enabling integration into a wide variety of client environments such as mobile apps, web apps, and games.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    LiveKit Agents

    LiveKit Agents

    Framework for building realtime multimodal voice AI agents apps

    LiveKit Agents is an open source framework designed for building realtime AI agents that can participate as programmable entities within communication sessions. It enables developers to create conversational and multimodal agents capable of processing voice, audio, and other inputs in realtime environments. These agents can join LiveKit rooms as participants and interact with users or systems through speech, text, and other modalities. LiveKit Agents provides libraries and tooling that allow developers to combine speech-to-text, large language models, and text-to-speech services to build interactive AI experiences. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    MiniMax Skills

    MiniMax Skills

    Development skills for AI coding agents

    MiniMax skills is a modular system designed to provide structured development capabilities for AI coding agents, enabling them to perform complex engineering tasks with guided workflows and domain-specific knowledge. It defines a set of reusable “skills” that encapsulate best practices, architectural patterns, and step-by-step processes for building applications across multiple platforms. These skills can be integrated into AI tools to improve the quality and consistency of generated code,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    AutoSubs

    AutoSubs

    Instantly generate AI-powered subtitles on your device

    ...Users can customize subtitle styling, adjust timing, and export results in multiple formats, making it suitable for content creators, filmmakers, and editors. AutoSubs is designed with performance in mind, offering efficient processing through a Rust-based backend and supporting multiple operating systems including Windows, macOS, and Linux.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    h2oGPT

    h2oGPT

    Private chat with local GPT with document, images, video, etc.

    h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    LittleJS

    LittleJS

    The Tiny JavaScript Game Engine That Can!

    LittleJS is a super lightweight 2D JavaScript game engine with fast WebGL rendering. It is designed to be small, simple, and easy to use for various applications, from game jams to commercial releases. This engine has everything necessary to make high-quality games, including fast rendering, physics, particles, sound effects, music, keyboard/mouse/gamepad input handling, update/render loop, and debug tools. It is recommended that you start by copying the LittleJS Starter Project This file is...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Docling

    Docling

    Get your documents ready for gen AI

    ...Docling is designed to run efficiently on commodity hardware and can be used both as a Python API and a command-line tool. Its modular architecture allows developers to extend functionality and integrate specialized models for tasks such as OCR and audio transcription. Overall, Docling serves as a comprehensive preprocessing layer for AI applications that require reliable, structured access to complex document data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    ...The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    Freeverb3_vst

    Freeverb3 DSP VST effect plugins

    The Freeverb3VST is a package of VST DSP effect plugins utilizing the Freeverb3 signal processing library. Many types of audio processing effects including high quality reverbs and impulse response convolution processors are available.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 16
    Streamer-Sales

    Streamer-Sales

    LLM Large Model of Selling Anchor

    Streamer-Sales is an open-source large language model system designed specifically for e-commerce live streaming and automated product promotion. The project focuses on generating persuasive product descriptions and live presentation scripts that mimic the style of professional online sales hosts. By analyzing product characteristics and marketing information, the model can produce engaging explanations that emphasize benefits, features, and emotional appeal to encourage viewers to make...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    IMS Toucan

    IMS Toucan

    Controllable and fast Text-to-Speech for over 7000 languages

    ...IMS-Toucan ships with several ready-to-run scripts, including GUIs for interactive demos, prosody override tools, zero-shot language embedding injection, and text-to-audio file generation. Pretrained models are automatically downloaded when needed, and there is an online demo instance hosted on GPU that anyone can try.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Flutter Rust Bridge

    Flutter Rust Bridge

    Rust binding generator, feature-rich, but seamless and simple

    ...The project supports passing complex types, handling async operations and streams, and integrating with Flutter across mobile and desktop targets. By leaning on Rust’s memory safety and zero-cost abstractions, it enables compute-heavy tasks—parsing, crypto, image/audio processing, and more—without sacrificing Flutter’s developer experience. Build scripts and templates streamline packaging and distribution so the Rust side fits cleanly into CI and multi-platform releases. In practice, teams gain a maintainable way to share one performant Rust core across multiple Flutter apps while keeping the UI reactive and fast.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    Equalizer APO

    A system-wide equalizer for Windows 7 / 8 / 8.1 / 10 / 11

    Equalizer APO is a parametric / graphic equalizer for Windows. It is implemented as an Audio Processing Object (APO) for the system effect infrastructure introduced with Windows Vista. Requirements: - Windows Vista or later (Windows 7 - 11 have been tested) - CPU architecture x64 (64 bit), x86 (32 bit) or ARM64 (on Windows 10/11) - applications must not bypass the system effect infrastructure (APIs like ASIO or WASAPI exclusive mode can't be used) Equalizer APO can be used in conjunction with Room EQ Wizard (http://www.roomeqwizard.com/), because it can read its filter text file format. ...
    Leader badge
    Downloads: 89,622 This Week
    Last Update:
    See Project
  • 21
    LameXP

    LameXP

    Multi-Format Audio-Encoder Front-end

    LameXP is a free multi-format audio file converter that supports a variety of output formats, including MP3, AAC/MP4, Ogg Vorbis, Opus, as well as FLAC, and an even higher number of input formats. It also supports batch processing and can utilize multiple processor cores.
    Leader badge
    Downloads: 294 This Week
    Last Update:
    See Project
  • 22
    OM-SoX

    OM-SoX

    Multichannel audio manipulation and batch processing for OpenMusic.

    OM-SoX is a library for multichannel audio manipulation and functional batch processing for OpenMusic, a visual programming environment based on CommonLisp / CLOS. It uses SoundeXchange as sound processing kernel (http://sox.sourceforge.net), © Chris Bagwell and SoX Contributors. NOTE> OM-SoX releases are not compatible with OpenMusic version 6.19. Please instead download & use a snapshot of the "development" branch of the source code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    FFmpeg Batch AV Converter

    FFmpeg Batch AV Converter

    Free all in one audio/video ffmpeg batch encoder

    FFmpeg Batch AV Converter is a free universal audio and video encoder for Windows and Linux (via Wine), that allows to use the full potential of ffmpeg command line with a few mouse clicks in a convenient GUI with drag and drop, progress information. Some fancy wizards make things easy for non-experts. Thanks to its multi-file encoding feature, it may be the fastest a/v batch encoder available, since it maximizes system resources usage by launching as many simultaneous processes up to...
    Leader badge
    Downloads: 2,921 This Week
    Last Update:
    See Project
  • 24
    Nyquist

    Nyquist

    Nyquist is a language for sound synthesis and music composition.

    Nyquist is a language for sound synthesis and music composition. It is implemented in C and C++ and runs on Win32, OSX, and Linux. Nyquist combines a powerful functional programming style with efficient signal-processing primitives. Nyquist is also embedded as a scripting language in Audacity.
    Leader badge
    Downloads: 36 This Week
    Last Update:
    See Project
  • 25
    RadioCaster

    RadioCaster

    Stream audio from your PC to the internet with ease

    ...RadioCaster is compatible with ASIO sound cards and allows capturing audio playback from your PC, line input, or even rebroadcasting internet streams. With built-in sound processing tools such as an equalizer, compressor, and support for VST and Winamp DSP plugins, you can achieve professional-quality broadcasts. The software also includes a built-in broadcasting server, metadata support, and streaming statistics.
    Downloads: 22 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB