Search Results for "audio processing" - Page 2

Showing 374 open source projects for "audio processing"

View related business solutions
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    TRIBE v2

    TRIBE v2

    A multimodal model for brain response prediction

    ...TRIBE v2 allows researchers to simulate and analyze brain activity without requiring direct human experiments. Overall, it provides a powerful tool for studying perception, cognition, and multimodal processing in the brain.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 2
    ReClip

    ReClip

    Download videos from almost any website

    ...Users can paste multiple URLs at once, select output formats such as MP4 or MP3, and choose quality settings before downloading. The system also includes features like automatic URL deduplication and batch processing to improve usability.
    Downloads: 169 This Week
    Last Update:
    See Project
  • 3
    Voice-Pro

    Voice-Pro

    Comprehensive Gradio WebUI for audio processing

    Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.
    Downloads: 46 This Week
    Last Update:
    See Project
  • 4
    cloud-morph

    cloud-morph

    Decentralize, Self-host Cloud Gaming/Application

    cloud-morph is a cloud-based media processing service that enables real-time video and audio transformation using FFmpeg in scalable environments. It is designed to run as a backend service that processes media streams or files and applies transformations such as transcoding, filtering, and format conversion. The system supports API-driven workflows, allowing integration into web applications or automation pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    OmniTools

    OmniTools

    Self-hosted collection of powerful web-based tools for everyday tasks

    ...It’s designed to replace the random assortment of “free online tools” people use for quick tasks, while avoiding ads, tracking, and the need to upload sensitive files to unknown servers. A key design choice is that file processing happens entirely on the client side, meaning your data stays in your browser instead of being sent to the backend. The tool catalog spans both technical and non-technical needs, including image, video, audio, PDF, text, date/time, math, and data format utilities like JSON/CSV/XML helpers. It’s also packaged for straightforward self-hosting, with a lightweight Docker image and simple run commands, so it can be deployed quickly on a homelab or internal network.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 6
    VibeVoice

    VibeVoice

    Open-source multi-speaker long-form text-to-speech model

    VibeVoice-1.5B is Microsoft’s frontier open-source text-to-speech (TTS) model designed for generating expressive, long-form, multi-speaker conversational audio such as podcasts. Unlike traditional TTS systems, it excels in scalability, speaker consistency, and natural turn-taking for up to 90 minutes of continuous speech with as many as four distinct speakers. A key innovation is its use of continuous acoustic and semantic speech tokenizers operating at an ultra-low frame rate of 7.5 Hz, enabling high audio fidelity with efficient processing of long sequences. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 7
    mediasoup

    mediasoup

    Cutting Edge WebRTC Video Conferencing

    mediasoup is a Node.js library that provides a cutting-edge WebRTC server capable of handling real-time communications with efficient media routing and processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    MediaDevices

    MediaDevices

    Go implementation of the MediaDevices API

    mediadevices is a Go library developed by the Pion WebRTC team that enables real-time access to audio and video devices for building native Go applications involving media streaming and conferencing. It provides a cross-platform, unified API for capturing and manipulating media streams and is often used in combination with Pion WebRTC for peer-to-peer communications. Its support for device enumeration, media constraints, and frame processing makes it a powerful building block for custom voice and video solutions in Go.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Bilidown

    Bilidown

    Bilibili video parsing download tool, supports 8K video

    bilidown is a command-line tool designed to download videos and audio content from the Bilibili platform with high flexibility and control. It supports downloading single videos, playlists, and series, allowing users to archive content efficiently. The tool integrates FFmpeg to merge audio and video streams when necessary, ensuring compatibility and high-quality output. It provides options for selecting resolution, format, and output structure, giving users control over the download process....
    Downloads: 25 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    SuperCollider

    SuperCollider

    Audio server, programming language, and IDE for sound synthesis

    SuperCollider is a platform for audio synthesis and algorithmic composition, used by musicians, artists, and researchers working with sound. It is free and open source software available for Windows, macOS, and Linux. scsynth, a real-time audio server, forms the core of the platform. It features 400+ unit generators (“UGens”) for analysis, synthesis, and processing.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    bfxr

    bfxr

    Flash + AIR sound effects generator. Based on Sfxr.

    ...Its purpose is to enable users, especially game developers and sound designers, to quickly generate retro, 8-bit/“chiptune” style sound effects (“bleeps”, “booms”, “zaps”, etc.) without deep knowledge of audio signal processing. It offers an interactive GUI through which you can tweak many parameters (oscillators, envelopes, filters, etc.) to sculpt custom sound effects; you can preview in real time, export, and iterate. The project includes libraries, HTML templates, and both ActionScript and JavaScript code. It has been well-received (over a thousand stars), but as of 2025, it has been superseded by a newer version called bfxr2, which is a JavaScript reworking of the original.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 12
    Membrane Core

    Membrane Core

    The core of Membrane Framework, multimedia processing framework

    membrane_core is the foundation of the Membrane multimedia framework for Elixir, providing the abstractions and runtime needed to build real-time audio and video pipelines. It models media processing as a graph of lightweight, supervised OTP processes—elements connected by links—so work is isolated, fault-tolerant, and easy to scale or reconfigure at runtime. The core defines a clear lifecycle and callback API for elements, plus concepts like buffers, events, and capabilities/format negotiation to keep components interoperable and type-safe. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Videomass

    Videomass

    Videomass is a free, open source and cross-platform GUI for FFmpeg

    Videomass is a free, open-source graphical interface for FFmpeg designed to make advanced video and audio processing accessible to both beginners and experienced users. Built in Python using wxPython, it provides a cross-platform environment for managing encoding, conversion, and editing tasks through a visual interface. The software supports multitasking operations, allowing users to process multiple media files simultaneously. It offers extensive configuration options while also providing presets to simplify common workflows. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    Overtone

    Overtone

    Collaborative programmable music

    Overtone is an open-source audio environment designed to explore new musical ideas from synthesis and sampling to instrument building, live coding and collaborative jamming. We combine the powerful SuperCollider audio engine, with Clojure, a state-of-the-art lisp, to create an intoxicating interactive sonic experience. Synchronize your visuals and noise with ease.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Youwee

    Youwee

    A beautiful, cross-platform downloader for YouTube, TikTok, Instagram

    Youwee is a modern cross-platform media downloader built with Tauri and React that supports downloading content from over 1800 websites including YouTube, TikTok, and Instagram. It provides a polished graphical interface that simplifies media downloading while leveraging powerful tools like yt-dlp and FFmpeg under the hood. The application supports advanced features such as batch downloads, playlist management, and extraction of audio or subtitles. It also integrates AI capabilities,...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 17
    WhisperX

    WhisperX

    Automatic Speech Recognition with Word-level Timestamps

    WhisperX is an advanced speech recognition system built on top of OpenAI’s Whisper model, designed to improve transcription accuracy and timing precision for long-form audio. It addresses key limitations of standard Whisper implementations by introducing voice activity detection and forced alignment techniques to produce word-level timestamps. The system enables batched inference, significantly increasing transcription speed while maintaining high accuracy. It is particularly effective for...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 18
    OpenAI Go

    OpenAI Go

    The official Go library for the OpenAI API

    ...It enables developers to integrate OpenAI’s models and features into Go applications with a clean and idiomatic interface. The library provides support for a wide range of API endpoints including chat completions, assistants, embeddings, image generation, audio processing, and batch jobs. It includes built-in tools for handling authentication, managing API requests, and parsing structured responses. The repository also offers examples to help developers quickly set up projects and test different API calls. Designed for reliability and ease of use, it is maintained to stay aligned with the evolving OpenAI API specifications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    AudioMuse-AI

    AudioMuse-AI

    AudioMuse-AI is an Open Source Dockerized environment

    AudioMuse-AI is an open-source system designed to automatically generate playlists and analyze music libraries using artificial intelligence and audio signal processing techniques. The platform runs locally in a Dockerized environment and performs detailed sonic analysis on audio files to understand characteristics such as tempo, mood, and acoustic similarity. By analyzing the underlying audio content rather than relying on external metadata services, the system can organize large personal music libraries and generate curated playlists for different moods or listening contexts. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    pyVideoTrans

    pyVideoTrans

    Translate the video from one language to another and embed dubbing

    pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 21
    Restreamer

    Restreamer

    Complete streaming server solution for self-hosting

    ...The system runs easily in Docker environments and can be deployed on devices ranging from servers to Raspberry Pi, offering flexibility and scalability. It includes features like an HTML5 player, API access, snapshot generation, and audio processing, allowing integration into custom applications. With full control over video data and no licensing costs, Restreamer is designed as a powerful yet accessible streaming solution for individuals and teams.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    ffmpeg.wasm

    ffmpeg.wasm

    FFmpeg for browser, powered by WebAssembly

    ffmpeg.wasm is a pure WebAssembly (and JavaScript/TypeScript) port of FFmpeg that enables in-browser media recording, conversion, and streaming—letting developers perform video/audio processing entirely client-side without server uploads. Transpiled via Emscripten from FFmpeg and its codecs into WebAssembly. Supports both single-threaded and multi-threaded cores using web workers. Written in TypeScript for improved developer experience.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 23
    audioFlux

    audioFlux

    A library for audio and music analysis, feature extraction

    A library for audio and music analysis, and feature extraction. Can be used for deep learning, pattern recognition, signal processing, bioinformatics, statistics, finance, etc. audioflux is a deep learning tool library for audio and music analysis, feature extraction. It supports dozens of time-frequency analysis transformation methods and hundreds of corresponding time-domain and frequency-domain feature combinations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    AudioBookConverter

    AudioBookConverter

    Improved AudioBookConverter based on freeipodsoftware release

    AudioBookConverter is a lightweight desktop application designed to convert and organize audiobook files into optimized formats such as M4B for playback on modern devices. It allows users to combine multiple audio files into a single audiobook while preserving chapters and metadata for seamless listening. The software supports a wide range of input formats including MP3, FLAC, and AAC, and provides flexible output options for different devices. It includes intelligent artwork handling,...
    Downloads: 33 This Week
    Last Update:
    See Project
  • 25
    Oboe

    Oboe

    Oboe is a C++ library that makes it easy to build high-performance

    oboe is a C++ library for building high-performance audio apps on Android, providing a unified, low-latency API over AAudio and OpenSL ES. It abstracts device and API-version differences so developers can focus on audio processing instead of platform quirks. The library emphasizes minimal latency and glitch-free playback/recording via tuned buffer strategies and callback-driven I/O. It supports features like floating-point audio, channel configuration, sample-rate negotiation, and stream sharing to match device capabilities. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB