Page 3 | audio processing free download

Vidi2

Large Multimodal Models for Video Understanding and Editing

...The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.

Downloads: 1 This Week

Last Update: 2026-03-04

See Project

Streamer-Sales

LLM Large Model of Selling Anchor

Streamer-Sales is an open-source large language model system designed specifically for e-commerce live streaming and automated product promotion. The project focuses on generating persuasive product descriptions and live presentation scripts that mimic the style of professional online sales hosts. By analyzing product characteristics and marketing information, the model can produce engaging explanations that emphasize benefits, features, and emotional appeal to encourage viewers to make...

Downloads: 6 This Week

Last Update: 2026-03-05

See Project

LiveKit Agents

Framework for building realtime multimodal voice AI agents apps

LiveKit Agents is an open source framework designed for building realtime AI agents that can participate as programmable entities within communication sessions. It enables developers to create conversational and multimodal agents capable of processing voice, audio, and other inputs in realtime environments. These agents can join LiveKit rooms as participants and interact with users or systems through speech, text, and other modalities. LiveKit Agents provides libraries and tooling that allow developers to combine speech-to-text, large language models, and text-to-speech services to build interactive AI experiences. ...

Downloads: 2 This Week

Last Update: 2 days ago

See Project

AudioCraft

Audiocraft is a library for audio processing and generation

...It also contains training code and recipes, so researchers can fine-tune on custom data or explore new objectives without building infrastructure from scratch. Example notebooks, CLI tools, and audio utilities help with prompt design, conditioning on reference audio, and post-processing to produce ready-to-share outputs.

Downloads: 8 This Week

Last Update: 2025-10-13

See Project

IMS Toucan

Controllable and fast Text-to-Speech for over 7000 languages

...IMS-Toucan ships with several ready-to-run scripts, including GUIs for interactive demos, prosody override tools, zero-shot language embedding injection, and text-to-audio file generation. Pretrained models are automatically downloaded when needed, and there is an online demo instance hosted on GPU that anyone can try.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

Jina

Build cross-modal and multimodal applications on the cloud

Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP,...

Downloads: 0 This Week

Last Update: 2024-11-12

See Project

Insanely Fast Whisper

An opinionated CLI to transcribe Audio files w/ Whisper on-device

Insanely Fast Whisper is a high-performance command-line tool designed to dramatically accelerate speech-to-text transcription using OpenAI’s Whisper models on local hardware. It leverages modern optimizations such as batch processing, mixed precision, and advanced attention mechanisms like Flash Attention to significantly reduce inference time while maintaining high transcription accuracy. The project is built on top of the Transformers ecosystem and integrates with libraries such as Optimum to maximize GPU efficiency. It is specifically engineered for environments with CUDA-enabled GPUs or Apple Silicon devices, allowing users to process hours of audio in minutes or even seconds depending on hardware capabilities. ...

Downloads: 1 This Week

Last Update: 2026-03-26

See Project

h2oGPT

Private chat with local GPT with document, images, video, etc.

h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and...

Downloads: 1 This Week

Last Update: 2025-02-22

See Project

Internet DJ Console

A feature packed DJ console and internet radio client for Linux users

Conceived as an internet radio Shoutcast/Icecast client and DJ console IDJC has two main media players, a background track player, effects buttons, crossfader, webm, aac, ogg, and mp3 streaming, stream automation timers, aux input, voice and VoIP integration. Media file formats include: mp3, ogg, flac, wma, wav, m4a, m3u, xspf, pls, and cue sheet support, IRC track and station announcements, uses jack audio connection kit to provide a flexible audio chain. This list of features is by no...

32 Reviews

Downloads: 6 This Week

Last Update: 2026-01-10

See Project

CSM (Conversational Speech Model)

A Conversational Speech Generation Model

The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.

Downloads: 4 This Week

Last Update: 2025-03-19

See Project

Grabbit

Free Windows app to download videos from YouTube, TikTok, Instagram an

Grabbit is a desktop video downloader for Windows. Download videos and audio from YouTube, TikTok, Instagram, Facebook, Twitter and more — up to 4K/8K quality. Includes a Chrome Extension to grab videos directly from your browser. No cloud, no account needed. Free tier available. Pro plan from $8.99/month.

Downloads: 5 This Week

Last Update: 2026-06-29

See Project

Pybris

B language compiler written in Python targeting RISVM

Pybris is a compiler written in Python using Pyparsing for the B Programming Language. The compiler emits a variant of Bitmario RISVM assembly. The practical goal of the project is to provide a way to develop digital signal processing (DSP) effects for the Competent Audio library that is a friendlier alternative to writing RISVM assembly by hand. Pybris is written for Python 2.7, but has also been tested to run with Python 3.8.10.

Downloads: 0 This Week

Last Update: 2024-07-08

See Project

MLT Multimedia Framework

A multimedia authoring and processing framework and a video playout server for television broadcasting.

17 Reviews

Downloads: 8 This Week

Last Update: 2026-06-25

See Project

DiffRhythm

Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation

DiffRhythm is an open-source, diffusion-based model designed to generate full-length songs. Focused on music creation, it combines advanced AI techniques to produce coherent and creative audio compositions. The model utilizes a latent diffusion architecture, making it capable of producing high-quality, long-form music. It can be accessed on Huggingface, where users can interact with a demo or download the model for further use. DiffRhythm offers tools for both training and inference, and its...

1 Review

Downloads: 6 This Week

Last Update: 2025-03-06

See Project

Ultimate Media Downloader

An Open source media downloader for downloading videos and audios

...Built with Python and powered by industry-standard extraction engines, it delivers enterprise-level capabilities with consumer-friendly simplicity. Whether you're downloading a single YouTube video, extracting audio from Spotify playlists, archiving TikTok content, or batch-processing entire music libraries, UMD handles it all with elegance and efficiency. IT CONSISTS OF : 1. Unified Interface: One command, 1000+ platforms. No tool shopping, no mental model switching. 2. Production-Ready, Zero Friction Installation: Most users go from hearing about the tool to downloading content in under 5 minutes. 3. ...

Downloads: 0 This Week

Last Update: 2026-06-09

See Project

OmniPull

Just pull anything

OmniPull is a powerful, cross-platform download manager built with Python and PySide6. It provides a modern, intuitive interface for managing downloads with advanced features like multi-threading, queue management, and media extraction.

Downloads: 4 This Week

Last Update: 2026-06-11

See Project

VCClient

Software that uses AI to perform real-time voice conversion

VCClient is a real-time voice conversion system that uses machine learning models to transform a speaker’s voice into another voice with minimal latency. It is designed for live applications such as streaming, gaming, and virtual communication, where immediate feedback is essential. The system supports multiple voice conversion models, including RVC and other neural network-based approaches, allowing users to switch between different voices or customize their output. It provides both a...

Downloads: 42 This Week

Last Update: 2026-03-23

See Project

vocal-separate

An extremely simple tool for separating vocals and background music

...Users can drag and drop an audio or video file onto the interface to begin separation, choosing between two, four, or five stems, which allows isolating specific components like vocals, bass, drums, or piano depending on the chosen model. After processing, the tool outputs separate WAV files for each extracted stem, making it easy to export and use in audio editing or remix software.

Downloads: 1 This Week

Last Update: 2026-02-17

See Project

Demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Demucs (Deep Extractor for Music Sources) is a deep-learning framework for music source separation—extracting individual instrument or vocal tracks from a mixed audio file. The system is based on a U-Net-like convolutional architecture combined with recurrent and transformer elements to capture both short-term and long-term temporal structure. It processes raw waveforms directly rather than spectrograms, allowing for higher-quality reconstruction and fewer artifacts in separated tracks. The...

Downloads: 103 This Week

Last Update: 2025-10-12

See Project

MahaKurawa.My.ID MP4 VA Extract

MahaKurawa.My.ID MP4 VA Extract is a tool to extract mp4 file content

MahaKurawa.My.ID MP4 VA Extract is a tool to extract MP4 file video and audio content. It also have ability to extract MKV file and single SSA Subtitle file. This software will not convert any video and audio file from MP4 file. This software just extract them as it is. This tool is made for that specific purpose. This tool "MahaKurawa.My.ID MP4 VA Extract v.1.0.3.1" can be obtained for free on https://www.mahakurawa.my.id.

Downloads: 0 This Week

Last Update: 2023-12-14

See Project

find-similar

User-friendly library to find similar objects

The mission of the FindSimilar project is to provide a powerful and versatile open source library that empowers developers to efficiently find similar objects and perform comparisons across a variety of data types. Whether dealing with texts, images, audio, or more, our project aims to simplify the process of identifying similarities and enhancing decision-making. https://github.com/findsimilar/find-similar - GitHub repo http://demo.findsimilar.org/ - Demo project and...

1 Review

Downloads: 0 This Week

Last Update: 2023-11-12

See Project

auto-subtitle

Automatically generate and overlay subtitles for any video

auto-subtitle is a Python-based command-line tool that automatically generates and overlays subtitles on video files using AI-driven speech recognition. It combines FFmpeg with OpenAI’s Whisper model to transcribe spoken audio into text and synchronize it with video playback. The tool processes video input, extracts audio, and produces subtitle files that can be either exported separately or burned directly into the final video output. It supports multiple transcription models with varying...

Downloads: 13 This Week

Last Update: 2026-04-24

See Project

Riffusion

Real-time music generation using stable diffusion techniques AI

...Riffusion (hobby) serves as the core implementation for audio and image processing, providing essential building blocks for generating music from text prompts. It includes both developer-oriented tools and user-facing components such as a command-line interface and an interactive Streamlit application for experimentation. Additionally, it can run as a Flask server to expose model inference through an API, enabling integration with other applications or services.

Downloads: 1 This Week

Last Update: 2026-03-18

See Project

Automatic YouTube subtitle generation

Using OpenAI's Whisper to automatically generate YouTube subtitles

Automatic YouTube subtitle generation is a command-line tool that combines YouTube downloading capabilities with AI-powered transcription using Whisper models. It allows users to download videos or audio from YouTube and automatically generate subtitles or transcripts. The tool processes media locally, extracting audio and applying speech recognition to produce accurate text outputs. It supports multiple languages and can handle different Whisper model sizes, balancing performance and accuracy. yt-whisperc is designed for automation, enabling batch processing of multiple videos for transcription workflows. ...

Downloads: 0 This Week

Last Update: 2026-04-24

See Project

Piano transcription

Task of transcribing piano recordings into MIDI files

Piano transcription is an open-source high-resolution piano transcription system by ByteDance that converts raw audio recordings of piano performance into symbolic MIDI files — detecting note onsets, offsets, pitch, velocity, and even pedal usage. The system is implemented in Python (PyTorch) and is capable of accurate transcription of polyphonic piano recordings, even with complex passages and pedal techniques, making it suitable for classical piano music. By using this transcription tool,...

Downloads: 4 This Week

Last Update: 2025-12-02

See Project

Search Results for "audio processing" - Page 3

Showing 102 open source projects for "audio processing"

Vidi2

Streamer-Sales

LiveKit Agents

AudioCraft

IMS Toucan

Jina

Insanely Fast Whisper

h2oGPT

Internet DJ Console

CSM (Conversational Speech Model)

Grabbit

Pybris

MLT Multimedia Framework

DiffRhythm

Ultimate Media Downloader

OmniPull

VCClient

vocal-separate

Demucs

MahaKurawa.My.ID MP4 VA Extract

find-similar

auto-subtitle

Riffusion

Automatic YouTube subtitle generation

Piano transcription

Search Results for "audio processing" - Page 3

Showing 102 open source projects for "audio processing"

Vidi2

Streamer-Sales

LiveKit Agents

AudioCraft

IMS Toucan

Jina

Insanely Fast Whisper

h2oGPT

Internet DJ Console

CSM (Conversational Speech Model)

Grabbit

Pybris

MLT Multimedia Framework

DiffRhythm

Ultimate Media Downloader

OmniPull

VCClient

vocal-separate

Demucs

MahaKurawa.My.ID MP4 VA Extract

find-similar

auto-subtitle

Riffusion

Automatic YouTube subtitle generation

Piano transcription

Related Searches

Related Categories