Search Results for "audio processing" - Page 5

Showing 308 open source projects for "audio processing"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    HunyuanVideo

    HunyuanVideo

    HunyuanVideo: A Systematic Framework For Large Video Generation Model

    HunyuanVideo is a cutting-edge framework designed for large-scale video generation, leveraging advanced AI techniques to synthesize videos from various inputs. It is implemented in PyTorch, providing pre-trained model weights and inference code for efficient deployment. The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    ezytdl

    ezytdl

    Advanced electron-based frontend for yt-dlp

    ...Its interface is designed for accessibility while still offering advanced configuration options. Overall, it provides a streamlined solution for downloading and processing online media content.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    YoutubeExplode

    YoutubeExplode

    Abstraction layer over YouTube's internal API

    ...Under the hood, the library parses raw page data and leverages reverse-engineered internal endpoints to obtain structured information and stream manifests. Developers can use it to access details such as titles, authors, durations, captions, and available media formats, as well as to download audio or video streams for further processing. The library is designed to be intuitive and cross-platform through .NET Standard compatibility, making it suitable for desktop tools, automation pipelines, and media utilities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    handbrake-js

    handbrake-js

    Video encoding / transcoding / converting for node.js

    handbrake-js is a Node.js wrapper around the HandBrake CLI tool, designed to bring powerful video transcoding capabilities into JavaScript applications. It provides a programmatic interface for converting video files from nearly any format into modern codecs using HandBrake’s encoding engine. The library allows developers to build automation workflows for media processing without directly interacting with command-line tools. It supports multiple output formats such as MP4 and MKV and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Docling

    Docling

    Get your documents ready for gen AI

    ...Docling is designed to run efficiently on commodity hardware and can be used both as a Python API and a command-line tool. Its modular architecture allows developers to extend functionality and integrate specialized models for tasks such as OCR and audio transcription. Overall, Docling serves as a comprehensive preprocessing layer for AI applications that require reliable, structured access to complex document data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    ...The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    LiveKit Agents

    LiveKit Agents

    Framework for building realtime multimodal voice AI agents apps

    LiveKit Agents is an open source framework designed for building realtime AI agents that can participate as programmable entities within communication sessions. It enables developers to create conversational and multimodal agents capable of processing voice, audio, and other inputs in realtime environments. These agents can join LiveKit rooms as participants and interact with users or systems through speech, text, and other modalities. LiveKit Agents provides libraries and tooling that allow developers to combine speech-to-text, large language models, and text-to-speech services to build interactive AI experiences. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    h2oGPT

    h2oGPT

    Private chat with local GPT with document, images, video, etc.

    h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    LittleJS

    LittleJS

    The Tiny JavaScript Game Engine That Can!

    LittleJS is a super lightweight 2D JavaScript game engine with fast WebGL rendering. It is designed to be small, simple, and easy to use for various applications, from game jams to commercial releases. This engine has everything necessary to make high-quality games, including fast rendering, physics, particles, sound effects, music, keyboard/mouse/gamepad input handling, update/render loop, and debug tools. It is recommended that you start by copying the LittleJS Starter Project This file is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Streamer-Sales

    Streamer-Sales

    LLM Large Model of Selling Anchor

    Streamer-Sales is an open-source large language model system designed specifically for e-commerce live streaming and automated product promotion. The project focuses on generating persuasive product descriptions and live presentation scripts that mimic the style of professional online sales hosts. By analyzing product characteristics and marketing information, the model can produce engaging explanations that emphasize benefits, features, and emotional appeal to encourage viewers to make...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    IMS Toucan

    IMS Toucan

    Controllable and fast Text-to-Speech for over 7000 languages

    ...IMS-Toucan ships with several ready-to-run scripts, including GUIs for interactive demos, prosody override tools, zero-shot language embedding injection, and text-to-audio file generation. Pretrained models are automatically downloaded when needed, and there is an online demo instance hosted on GPU that anyone can try.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Flutter Rust Bridge

    Flutter Rust Bridge

    Rust binding generator, feature-rich, but seamless and simple

    ...The project supports passing complex types, handling async operations and streams, and integrating with Flutter across mobile and desktop targets. By leaning on Rust’s memory safety and zero-cost abstractions, it enables compute-heavy tasks—parsing, crypto, image/audio processing, and more—without sacrificing Flutter’s developer experience. Build scripts and templates streamline packaging and distribution so the Rust side fits cleanly into CI and multi-platform releases. In practice, teams gain a maintainable way to share one performant Rust core across multiple Flutter apps while keeping the UI reactive and fast.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    OM-SoX

    OM-SoX

    Multichannel audio manipulation and batch processing for OpenMusic.

    OM-SoX is a library for multichannel audio manipulation and functional batch processing for OpenMusic, a visual programming environment based on CommonLisp / CLOS. It uses SoundeXchange as sound processing kernel (http://sox.sourceforge.net), © Chris Bagwell and SoX Contributors. NOTE> OM-SoX releases are not compatible with OpenMusic version 6.19. Please instead download & use a snapshot of the "development" branch of the source code.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    AudioEnhancerMAX

    AudioEnhancerMAX

    Open-source AI audio processing suite: 100% local, free, no limits.

    AudioEnhancerMAX is the open-source AI-powered audio media center for podcasters, creators, and professionals. It runs 100% locally on your hardware — no cloud, no subscriptions, no limits. Features: 16+ intelligent audio filters (AI noise removal, filler word detection, breath reduction, studio sound, audio super-resolution), Gemma 4 AI dynamic parameter tuning, distributed edge computing across Android smartphones, real-time system monitoring for Apple Silicon. Built with FastAPI,...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 17
    Nyquist

    Nyquist

    Nyquist is a language for sound synthesis and music composition.

    Nyquist is a language for sound synthesis and music composition. It is implemented in C and C++ and runs on Win32, OSX, and Linux. Nyquist combines a powerful functional programming style with efficient signal-processing primitives. Nyquist is also embedded as a scripting language in Audacity.
    Leader badge
    Downloads: 26 This Week
    Last Update:
    See Project
  • 18
    Bootleg Sound Processor

    Bootleg Sound Processor

    Software for processing audio files.

    Software for processing audio files. The files "Batch Processor.py" and "Duplicate remover.py" are meant to be used with the output of Bootleg Text Slicer (https://github.com/Northstrix/bootleg-text-slicer) placed into the "Unprocessed" folder, while "Single file processor.py" can be used with standalone files from arbitrary locations. GitHub repository: https://github.com/Northstrix/bootleg-sound-processor Made using Google AI Studio (https://aistudio.google.com/) and Perplexity (https://www.perplexity.ai/)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    ApraPipes

    A pipeline framework for developing video and image processing apps

    ApraPipes is a C++ multimedia processing framework designed for building high-performance video/audio processing pipelines with GPU acceleration. It provides a modular, declarative architecture for creating complex media processing workflows that span camera capture, encoding/decoding, computer vision, AI operations, and output to files, streams, or displays.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    WildMidi is a midi processing library and a midi player using the gus patch set.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21

    runabc

    Runabc is a user interface supporting abc music notation software

    Runabc is a graphic user interface to the abcMIDI, abc2svg and abcm2ps packages which normally run in a command window. In addition it contains numerous tools for editing and processing and analyzing abc and midi files. Runabc has been included in the sourceforge abc music project. It is now becoming a separate project on its own.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    VCClient

    VCClient

    Software that uses AI to perform real-time voice conversion

    VCClient is a real-time voice conversion system that uses machine learning models to transform a speaker’s voice into another voice with minimal latency. It is designed for live applications such as streaming, gaming, and virtual communication, where immediate feedback is essential. The system supports multiple voice conversion models, including RVC and other neural network-based approaches, allowing users to switch between different voices or customize their output. It provides both a...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 23
    MATLAB Deep Learning Model Hub

    MATLAB Deep Learning Model Hub

    Discover pretrained models for deep learning in MATLAB

    Discover pre-trained models for deep learning in MATLAB. Pretrained image classification networks have already learned to extract powerful and informative features from natural images. Use them as a starting point to learn a new task using transfer learning. Inputs are RGB images, the output is the predicted label and score.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Snowmix

    Snowmix

    Video mixer for mixing live and recorded video and audio feeds

    New version 0.5.2.2 Released May 15th 2026. Snowmix is a Swiss army knife tool for mixing live and recorded video and audio feeds. It supports 2D and 3D clipping, scaling and transparent overlay of video, png graphics and text. It supports animation of video, images and texts through native commands changing scale, placement, transparency and rotation. Animation and actions can also be controlled through native scripting and an embedded Tcl and/or Python interpreter. Snowmix is designed...
    Downloads: 31 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB