Search Results for "audio processing" - Page 3

Showing 308 open source projects for "audio processing"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    ScreenPipe

    ScreenPipe

    AI app store powered by 24/7 desktop history. open source

    Screenpipe is an AI app store powered by continuous desktop history recording. It operates entirely locally, offering developers a platform to build, distribute, and monetize AI applications that leverage comprehensive contextual data from users' desktop activities. ​
    Downloads: 35 This Week
    Last Update:
    See Project
  • 2
    AudioKit

    AudioKit

    Swift audio synthesis, processing, & analysis platform

    AudioKit is an entire audio development ecosystem of code repositories, packages, libraries, algorithms, applications, playgorunds, tests, and scripts, built and used by a community of audio programmers, app developers, engineers, researchers, scientists, musicians, gamers, and people new to programming. An important goal for AudioKit is to allow it to grow and be maintainable by a handful of volunteers. For this reason we have extensive tests that are run whenever changes are made to any...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    LiveAvatar

    LiveAvatar

    Streaming Real-time Audio-Driven Avatar Generation

    LiveAvatar is an open-source research and implementation project that provides a unified framework for real-time, streaming, interactive avatar video generation driven by audio and other control signals. It implements techniques from state-of-the-art diffusion-based avatar modeling to support infinite-length continuous video generation with low latency, enabling interactive AI avatars that maintain continuity and realism over extended sessions. The project co-designs algorithms and system optimizations, such as block-wise autoregressive processing and fast sampling strategies, to deliver real-time frame rates (e.g., ~45 FPS on appropriate GPU clusters) while handling non-stop generation without quality degradation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    VERT.sh

    VERT.sh

    The next-generation file converter

    VERT is a modern, privacy-focused file conversion platform that leverages WebAssembly to perform conversions entirely on the user’s device rather than relying on cloud-based processing. Built with Svelte and TypeScript, it provides a clean and responsive interface for converting a wide variety of file types, including images, audio, video, and documents. One of its defining characteristics is its local-first approach, which eliminates the need to upload files to external servers, thereby improving both privacy and performance. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    LTX-Video

    LTX-Video

    Official repository for LTX-Video

    LTX-Video is a sophisticated multimedia processing framework from Lightricks designed to handle high-quality video editing, compositing, and transformation tasks with performance and scalability. It provides runtime components that efficiently decode, encode, and manipulate video streams, frame buffers, and audio tracks while exposing a rich API for building customized editing features like transitions, effects, color grading, and keyframe automation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    PyAV

    PyAV

    Pythonic bindings for FFmpeg's libraries

    ...While powerful, it requires a solid understanding of FFmpeg concepts, as it prioritizes flexibility and control over abstraction. Overall, PyAV is a robust tool for developers building advanced video and audio processing systems in Python.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Scriberr

    Scriberr

    Self-hosted AI audio transcription

    Scriberr is a self-hosted AI-powered transcription platform designed to convert audio and video into highly accurate text while prioritizing privacy and local processing. Unlike cloud-based transcription services, Scriberr runs entirely on the user’s machine, ensuring that sensitive recordings are never sent to third-party servers and remain fully under user control. It leverages modern speech recognition models such as Whisper and other advanced architectures to deliver precise transcripts with word-level timing and speaker identification. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    miniaudio

    miniaudio

    Audio playback and capture library written in C,

    miniaudio is written in C with no dependencies except the standard library and should compile cleanly on all major compilers without the need to install any additional development packages. All major desktop and mobile platforms are supported. miniaudio gives you complete flexibility. With the low-level API, just initialize a connection to the device and send or receive raw audio data. The modular design of miniaudio allows you to use the low-level API without compromising your ability to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    JamTools

    JamTools

    JamTools is a cross-platform gadget set software

    JamTools is a multifunctional desktop utility suite designed to provide a collection of tools for productivity, media processing, and system enhancements within a single application. It integrates various features such as file management, multimedia handling, and system utilities into a unified interface. The project emphasizes ease of use while offering advanced functionality for handling common tasks efficiently. It includes support for media-related operations, often leveraging FFmpeg for processing video and audio content. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 10
    Verticals v3

    Verticals v3

    Automated YouTube Shorts pipeline

    ...The pipeline emphasizes automation, allowing users to produce short-form content at scale with minimal manual intervention. It integrates FFmpeg and other media processing tools to handle video transformations, resizing, and encoding. The system also supports adding overlays, captions, and audio enhancements to improve engagement. Designed for creators and developers, it enables repeatable workflows for generating social media content efficiently. Its modular structure allows customization of each stage in the pipeline, making it adaptable to different content strategies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    VideoCaptioner

    VideoCaptioner

    AI-powered tool for generating, optimizing, and translating subtitles

    VideoCaptioner is an open source AI-powered subtitle processing tool designed to simplify the workflow of creating subtitles for videos. It integrates speech recognition, language processing, and translation technologies to automatically generate and refine subtitles from video or audio sources. VideoCaptioner uses speech-to-text engines such as Whisper variants to transcribe spoken content and convert it into subtitle text with accurate timestamps.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 12
    KrillinAI

    KrillinAI

    Video translation and dubbing tool powered by LLMs

    ...It integrates several stages of the pipeline: video acquisition (either from local files or remote via download tools), speech recognition (ASR), subtitle segmentation and alignment, machine translation (with context-aware translation to preserve semantics), and voice cloning + text-to-speech (TTS) to produce dubbed audio tracks. KrillinAI supports both landscape and portrait videos, which makes it suitable for a wide range of platforms — from YouTube to TikTok or other vertical-video sites — and ensures correct formatting and layout for the final video. The tool offers “one-click” workflows and desktop versions, lowering the barrier for users who may not be familiar with video editing or audio processing pipelines.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    Ultravox

    Ultravox

    Fast multimodal LLM for real-time voice interaction and AI apps

    Ultravox is an open source multimodal large language model designed specifically for real-time voice-based interactions. It is built to process both text and spoken audio directly, eliminating the need for a separate speech recognition stage and enabling more seamless conversational experiences. Ultravox works by combining text prompts with encoded audio inputs, allowing it to understand spoken language alongside written instructions in a unified pipeline. Internally, it leverages pretrained...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Music-bot

    Music-bot

    A complete code to download for a cool Discord music bot

    Music-bot is a Discord bot designed to stream and manage music playback within voice channels, providing users with an interactive audio experience. It supports playing music from various online sources, including streaming platforms and direct URLs. The bot includes queue management features that allow users to add, remove, and reorder tracks during playback. It integrates audio processing tools to ensure smooth streaming and consistent playback quality. Music-bot also supports commands for controlling playback, such as pause, resume, skip, and volume adjustment. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    FFmate

    FFmate

    FFmate is a modern and powerful automation layer

    FFmate is a graphical utility designed to simplify the use of FFmpeg by providing an intuitive interface for building and executing multimedia processing commands. It allows users to perform tasks such as transcoding, trimming, and format conversion without needing to memorize command-line syntax. The tool dynamically generates FFmpeg commands based on user input, making complex workflows more accessible. It supports a wide range of audio and video formats, enabling flexible media processing. ffmate is designed for both beginners and advanced users, offering a balance between simplicity and customization. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    SALMONN family

    SALMONN family

    A suite of advanced multi-modal LLMs

    SALMONN is a family of advanced multi-modal large language models (LLMs) developed by ByteDance — designed to handle and integrate multiple data modalities (e.g. text, audio, video) rather than just plain text. The repository bundles different branches targeting specialized tasks (e.g. video-SALMONN, speech-quality assessment, general multimodal tasks), suggesting that the project is modular and extensible across domains. SALMONN aims to push the frontier of multi-modal AI by allowing models...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    video-use

    video-use

    Edit videos with Claude Code

    ...Designed to work with Claude Code, it automates the entire editing process—from cutting clips to rendering the final output—without requiring manual timelines or complex software interfaces. The system intelligently analyzes audio transcripts and visual cues to make precise, context-aware editing decisions. It supports a wide range of content types, including interviews, tutorials, montages, and talking-head videos. By combining structured text representations with on-demand visual previews, it minimizes processing overhead while maintaining high-quality results. ...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 18
    Datasets

    Datasets

    Hub of ready-to-use datasets for ML models

    Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    ffmpeg-commander

    ffmpeg-commander

    A web-based GUI for quickly generating common FFmpeg command-line

    ...The interface is inspired by tools like HandBrake, aiming to lower the barrier to entry for FFmpeg usage. Overall, it acts as a bridge between ease of use and powerful multimedia processing capabilities.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    MoviePy

    MoviePy

    Video editing with Python

    MoviePy is a Python module for video editing, which can be used for basic operations (like cuts, concatenations, title insertions), video compositing (a.k.a. non-linear editing), video processing, or to create advanced effects. It can read and write the most common video formats, including GIF. MoviePy is an open source software originally written by Zulko and released under the MIT licence. It works on Windows, Mac, and Linux, with Python 2 or Python 3. The code is hosted on Github, where...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 21
    FastRTC

    FastRTC

    The python library for real-time communication

    FastRTC is a Python library designed to simplify real-time communication (RTC), especially for audio and video streaming applications. It abstracts away much of the complexity that typically comes with implementing WebRTC by providing a simple interface — e.g. a Stream class — that can be mounted within a web backend (for example a FastAPI application). This makes it particularly well suited for building real-time voice (or video) interfaces for applications such as AI assistants, live chat, or collaborative audio/video tools. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    OpenAI .NET

    OpenAI .NET

    The official .NET library for the OpenAI API

    OpenAI .NET is the official client library for calling the OpenAI REST API from C# and other .NET languages, with first-class support for modern .NET patterns. It provides strongly typed clients across API areas (chat, audio, images, embeddings, moderations, batches, files, models, vector stores, responses, realtime, assistants) and works with .NET Standard 2.0 while the examples use .NET 8. You install it via NuGet and authenticate with an API key, ideally through environment variables or...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    fooyin

    fooyin

    A customisable music player

    ...It provides a modular interface that can be built from scratch or adapted from preset layouts, allowing users to tailor the experience to their workflow. The player supports a wide range of audio formats and includes advanced playback features such as gapless playback, ReplayGain, and DSP processing. It integrates a powerful plugin system that enables extensions for widgets, decoders, metadata handling, and external services. fooyin also includes a scripting language called FooScript, which allows users to customize interface behavior, automate playlists, and control display logic. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    Suno AI API

    Suno AI API

    Use API to call the music generation AI of suno.ai

    Suno API is an unofficial open-source interface that enables developers to programmatically interact with Suno’s AI music generation platform, allowing automated creation of songs, lyrics, and audio content through API calls. It replicates the behavior of Suno’s web-based creation tools by reverse engineering internal endpoints and exposing them through a developer-friendly interface built with Python and FastAPI. The system supports asynchronous processing, enabling efficient handling of multiple generation requests and making it suitable for scalable applications and automation pipelines. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    Instill Core

    Instill Core

    Instill Core is a full-stack AI infrastructure tool for data

    Instill Core is an open-source, full-stack AI infrastructure platform designed to orchestrate data pipelines, machine learning models, and unstructured data processing into a unified, production-ready system. It provides an end-to-end solution that enables developers to build, deploy, and manage AI-powered applications without needing to manually stitch together multiple tools across the data and model lifecycle. The platform focuses heavily on handling unstructured data such as documents, images, audio, and video, transforming them into AI-ready formats through integrated ETL pipelines and processing workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB