Search Results for "audio processing" - Page 5

Showing 374 open source projects for "audio processing"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    clip-js

    clip-js

    online video editor built with nextjs, remotion and ffmpeg

    clip-js is a browser-based video editor built with modern web technologies such as Next.js and Remotion, designed to provide real-time editing and rendering directly in the browser. It enables users to create and edit video compositions using a timeline interface, combining video, audio, images, and text layers into a single project. The system uses a WebAssembly port of FFmpeg to perform high-quality rendering, allowing export of videos without relying on server-side processing. It includes interactive controls for trimming, splitting, and arranging media elements with precise timing. The editor supports dynamic adjustments such as opacity, positioning, and layering to fine-tune compositions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    GuitarPedal

    GuitarPedal

    Linus learns analog circuits

    ...It doubles as a teaching aid for musicians who code, showing how buffers, sampling rates, and numerical stability affect tone. While not a full multi-FX suite, it offers a compact sandbox for experimenting with guitar processing on modest hardware.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ffmpeg_develop_doc

    ffmpeg_develop_doc

    2023, the latest audio and video learning materials, projects

    ffmpeg_develop_doc is a curated repository that aggregates a comprehensive collection of learning resources related to FFmpeg and multimedia development. It includes command references, technical articles, academic papers, tutorials, and example projects covering audio and video processing concepts. The repository is structured as a knowledge base, offering materials on encoding, decoding, streaming protocols, and real-time media systems. It also contains interview preparation resources and practical case studies, making it useful for both learning and professional development. In addition to documentation, it links to open-source projects and implementation examples that demonstrate real-world usage of FFmpeg. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Open Vision Agents by Stream

    Open Vision Agents by Stream

    Build Vision Agents quickly with any model or video provider

    ...It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio and video latency while processing frames and generating responses. Developers work with an agent abstraction that connects video edge providers, LLMs, and processors into pipelines, making it easier to orchestrate tasks like object detection, pose estimation, and conversational guidance. The project includes SDKs for React, Android, iOS, Flutter, React Native, and Unity, enabling integration into a wide variety of client environments such as mobile apps, web apps, and games.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    MediaPipe Solutions

    MediaPipe Solutions

    Cross-platform, customizable ML solutions

    MediaPipe is an open-source framework developed by Google for building cross-platform machine learning pipelines that process audio, video, and other streaming data in real time. The system provides developers with tools and reusable components that allow them to combine multiple machine learning models with preprocessing and postprocessing logic into efficient perception pipelines. These pipelines can run on a wide variety of platforms including mobile devices, desktop systems, web...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Dolphin

    Dolphin

    Document Image Parsing via Heterogeneous Anchor Prompting”

    ...It is designed to integrate with other tools and libraries and provide stable playback or media-processing pipelines, while remaining open-source so that users can inspect, extend, and adapt it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    HunyuanVideo

    HunyuanVideo

    HunyuanVideo: A Systematic Framework For Large Video Generation Model

    HunyuanVideo is a cutting-edge framework designed for large-scale video generation, leveraging advanced AI techniques to synthesize videos from various inputs. It is implemented in PyTorch, providing pre-trained model weights and inference code for efficient deployment. The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    ezytdl

    ezytdl

    Advanced electron-based frontend for yt-dlp

    ...Its interface is designed for accessibility while still offering advanced configuration options. Overall, it provides a streamlined solution for downloading and processing online media content.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 10
    YoutubeExplode

    YoutubeExplode

    Abstraction layer over YouTube's internal API

    ...Under the hood, the library parses raw page data and leverages reverse-engineered internal endpoints to obtain structured information and stream manifests. Developers can use it to access details such as titles, authors, durations, captions, and available media formats, as well as to download audio or video streams for further processing. The library is designed to be intuitive and cross-platform through .NET Standard compatibility, making it suitable for desktop tools, automation pipelines, and media utilities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    handbrake-js

    handbrake-js

    Video encoding / transcoding / converting for node.js

    handbrake-js is a Node.js wrapper around the HandBrake CLI tool, designed to bring powerful video transcoding capabilities into JavaScript applications. It provides a programmatic interface for converting video files from nearly any format into modern codecs using HandBrake’s encoding engine. The library allows developers to build automation workflows for media processing without directly interacting with command-line tools. It supports multiple output formats such as MP4 and MKV and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Docling

    Docling

    Get your documents ready for gen AI

    ...Docling is designed to run efficiently on commodity hardware and can be used both as a Python API and a command-line tool. Its modular architecture allows developers to extend functionality and integrate specialized models for tasks such as OCR and audio transcription. Overall, Docling serves as a comprehensive preprocessing layer for AI applications that require reliable, structured access to complex document data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    ...The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    LiveKit Agents

    LiveKit Agents

    Framework for building realtime multimodal voice AI agents apps

    LiveKit Agents is an open source framework designed for building realtime AI agents that can participate as programmable entities within communication sessions. It enables developers to create conversational and multimodal agents capable of processing voice, audio, and other inputs in realtime environments. These agents can join LiveKit rooms as participants and interact with users or systems through speech, text, and other modalities. LiveKit Agents provides libraries and tooling that allow developers to combine speech-to-text, large language models, and text-to-speech services to build interactive AI experiences. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    h2oGPT

    h2oGPT

    Private chat with local GPT with document, images, video, etc.

    h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    LittleJS

    LittleJS

    The Tiny JavaScript Game Engine That Can!

    LittleJS is a super lightweight 2D JavaScript game engine with fast WebGL rendering. It is designed to be small, simple, and easy to use for various applications, from game jams to commercial releases. This engine has everything necessary to make high-quality games, including fast rendering, physics, particles, sound effects, music, keyboard/mouse/gamepad input handling, update/render loop, and debug tools. It is recommended that you start by copying the LittleJS Starter Project This file is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Streamer-Sales

    Streamer-Sales

    LLM Large Model of Selling Anchor

    Streamer-Sales is an open-source large language model system designed specifically for e-commerce live streaming and automated product promotion. The project focuses on generating persuasive product descriptions and live presentation scripts that mimic the style of professional online sales hosts. By analyzing product characteristics and marketing information, the model can produce engaging explanations that emphasize benefits, features, and emotional appeal to encourage viewers to make...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    IMS Toucan

    IMS Toucan

    Controllable and fast Text-to-Speech for over 7000 languages

    ...IMS-Toucan ships with several ready-to-run scripts, including GUIs for interactive demos, prosody override tools, zero-shot language embedding injection, and text-to-audio file generation. Pretrained models are automatically downloaded when needed, and there is an online demo instance hosted on GPU that anyone can try.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Flutter Rust Bridge

    Flutter Rust Bridge

    Rust binding generator, feature-rich, but seamless and simple

    ...The project supports passing complex types, handling async operations and streams, and integrating with Flutter across mobile and desktop targets. By leaning on Rust’s memory safety and zero-cost abstractions, it enables compute-heavy tasks—parsing, crypto, image/audio processing, and more—without sacrificing Flutter’s developer experience. Build scripts and templates streamline packaging and distribution so the Rust side fits cleanly into CI and multi-platform releases. In practice, teams gain a maintainable way to share one performant Rust core across multiple Flutter apps while keeping the UI reactive and fast.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    OM-SoX

    OM-SoX

    Multichannel audio manipulation and batch processing for OpenMusic.

    OM-SoX is a library for multichannel audio manipulation and functional batch processing for OpenMusic, a visual programming environment based on CommonLisp / CLOS. It uses SoundeXchange as sound processing kernel (http://sox.sourceforge.net), © Chris Bagwell and SoX Contributors. NOTE> OM-SoX releases are not compatible with OpenMusic version 6.19. Please instead download & use a snapshot of the "development" branch of the source code.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    FFmpeg Batch AV Converter

    FFmpeg Batch AV Converter

    Free all in one audio/video ffmpeg batch encoder

    FFmpeg Batch AV Converter is a free universal audio and video encoder for Windows and Linux (via Wine), that allows to use the full potential of ffmpeg command line with a few mouse clicks in a convenient GUI with drag and drop, progress information. Some fancy wizards make things easy for non-experts. Thanks to its multi-file encoding feature, it may be the fastest a/v batch encoder available, since it maximizes system resources usage by launching as many simultaneous processes up to...
    Leader badge
    Downloads: 1,216 This Week
    Last Update:
    See Project
  • 23
    AudioEnhancerMAX

    AudioEnhancerMAX

    Open-source AI audio processing suite: 100% local, free, no limits.

    AudioEnhancerMAX is the open-source AI-powered audio media center for podcasters, creators, and professionals. It runs 100% locally on your hardware — no cloud, no subscriptions, no limits. Features: 16+ intelligent audio filters (AI noise removal, filler word detection, breath reduction, studio sound, audio super-resolution), Gemma 4 AI dynamic parameter tuning, distributed edge computing across Android smartphones, real-time system monitoring for Apple Silicon. Built with FastAPI,...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 24
    Encoder of Death
    Encoder of Death is a video/audio file encoding app. Batch Processing: Encode multiple files simultaneously in queue. Audio/Video Conversion: Convert between video formats or extract audio from video. Format Support: Audio: MP3, WAV, FLAC, AIFF, AAC, M4A, OGG Video: MP4, MKV, AVI, MOV, WebM, FLV, WMV, MPEG, MPG, M4V, 3GP 1-27-25, I addressed the issue of FFmpeg not bundling with the executables.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 25
    Nyquist

    Nyquist

    Nyquist is a language for sound synthesis and music composition.

    Nyquist is a language for sound synthesis and music composition. It is implemented in C and C++ and runs on Win32, OSX, and Linux. Nyquist combines a powerful functional programming style with efficient signal-processing primitives. Nyquist is also embedded as a scripting language in Audacity.
    Leader badge
    Downloads: 26 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB