Page 3 | audio processing free download

Dolphin

Document Image Parsing via Heterogeneous Anchor Prompting”

...It is designed to integrate with other tools and libraries and provide stable playback or media-processing pipelines, while remaining open-source so that users can inspect, extend, and adapt it.

Downloads: 0 This Week

Last Update: 2026-03-25

See Project

Open Vision Agents by Stream

Build Vision Agents quickly with any model or video provider

...It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio and video latency while processing frames and generating responses. Developers work with an agent abstraction that connects video edge providers, LLMs, and processors into pipelines, making it easier to orchestrate tasks like object detection, pose estimation, and conversational guidance. The project includes SDKs for React, Android, iOS, Flutter, React Native, and Unity, enabling integration into a wide variety of client environments such as mobile apps, web apps, and games.

Downloads: 1 This Week

Last Update: 4 days ago

See Project

Scanopy

Clean network diagrams, One-time setup, zero upkeep

Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...

Downloads: 3 This Week

Last Update: 2026-04-28

See Project

YoutubeExplode

Abstraction layer over YouTube's internal API

...Under the hood, the library parses raw page data and leverages reverse-engineered internal endpoints to obtain structured information and stream manifests. Developers can use it to access details such as titles, authors, durations, captions, and available media formats, as well as to download audio or video streams for further processing. The library is designed to be intuitive and cross-platform through .NET Standard compatibility, making it suitable for desktop tools, automation pipelines, and media utilities.

Downloads: 1 This Week

Last Update: 2026-04-22

See Project

LiveKit Agents

Framework for building realtime multimodal voice AI agents apps

LiveKit Agents is an open source framework designed for building realtime AI agents that can participate as programmable entities within communication sessions. It enables developers to create conversational and multimodal agents capable of processing voice, audio, and other inputs in realtime environments. These agents can join LiveKit rooms as participants and interact with users or systems through speech, text, and other modalities. LiveKit Agents provides libraries and tooling that allow developers to combine speech-to-text, large language models, and text-to-speech services to build interactive AI experiences. ...

Downloads: 2 This Week

Last Update: 3 days ago

See Project

Vidi2

Large Multimodal Models for Video Understanding and Editing

...The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.

Downloads: 0 This Week

Last Update: 2026-03-04

See Project

Flutter Rust Bridge

Rust binding generator, feature-rich, but seamless and simple

...The project supports passing complex types, handling async operations and streams, and integrating with Flutter across mobile and desktop targets. By leaning on Rust’s memory safety and zero-cost abstractions, it enables compute-heavy tasks—parsing, crypto, image/audio processing, and more—without sacrificing Flutter’s developer experience. Build scripts and templates streamline packaging and distribution so the Rust side fits cleanly into CI and multi-platform releases. In practice, teams gain a maintainable way to share one performant Rust core across multiple Flutter apps while keeping the UI reactive and fast.

Downloads: 2 This Week

Last Update: 2026-03-29

See Project

Streamer-Sales

LLM Large Model of Selling Anchor

Streamer-Sales is an open-source large language model system designed specifically for e-commerce live streaming and automated product promotion. The project focuses on generating persuasive product descriptions and live presentation scripts that mimic the style of professional online sales hosts. By analyzing product characteristics and marketing information, the model can produce engaging explanations that emphasize benefits, features, and emotional appeal to encourage viewers to make...

Downloads: 0 This Week

Last Update: 2026-03-05

See Project

Nyquist

Nyquist is a language for sound synthesis and music composition.

Nyquist is a language for sound synthesis and music composition. It is implemented in C and C++ and runs on Win32, OSX, and Linux. Nyquist combines a powerful functional programming style with efficient signal-processing primitives. Nyquist is also embedded as a scripting language in Audacity.

3 Reviews

Downloads: 20 This Week

Last Update: 2025-03-31

See Project

MATLAB Deep Learning Model Hub

Discover pretrained models for deep learning in MATLAB

Discover pre-trained models for deep learning in MATLAB. Pretrained image classification networks have already learned to extract powerful and informative features from natural images. Use them as a starting point to learn a new task using transfer learning. Inputs are RGB images, the output is the predicted label and score.

Downloads: 1 This Week

Last Update: 2024-10-11

See Project

WildMidi Midi Library and Player

WildMidi is a midi processing library and a midi player using the gus patch set.

2 Reviews

Downloads: 5 This Week

Last Update: 2026-03-12

See Project

CSM (Conversational Speech Model)

A Conversational Speech Generation Model

The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.

Downloads: 5 This Week

Last Update: 2025-03-19

See Project

Drumstick Libraries

MIDI libraries for Qt/C++

Drumstick is a tool to play music. This is a set of C++ MIDI libraries using Qt5 objects, idioms and style. It contains a C++ wrapper around the ALSA library sequencer interface; ALSA sequencer provides software support for MIDI technology on Linux. A complementary library provides classes for SMF (Standard MIDI files: .MID/.KAR), and Cakewalk (.WRK) file formats processing. A multiplatform realtime MIDI I/O library is also provided.

Downloads: 12 This Week

Last Update: 2026-04-18

See Project

Snowmix

Video mixer for mixing live and recorded video and audio feeds

New version 0.5.2.2 Released May 15th 2026. Snowmix is a Swiss army knife tool for mixing live and recorded video and audio feeds. It supports 2D and 3D clipping, scaling and transparent overlay of video, png graphics and text. It supports animation of video, images and texts through native commands changing scale, placement, transparency and rotation. Animation and actions can also be controlled through native scripting and an embedded Tcl and/or Python interpreter. Snowmix is designed...

10 Reviews

Downloads: 34 This Week

Last Update: 2026-05-15

See Project

MLT Multimedia Framework

A multimedia authoring and processing framework and a video playout server for television broadcasting.

17 Reviews

Downloads: 18 This Week

Last Update: 2026-04-22

See Project

DiffRhythm

Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation

DiffRhythm is an open-source, diffusion-based model designed to generate full-length songs. Focused on music creation, it combines advanced AI techniques to produce coherent and creative audio compositions. The model utilizes a latent diffusion architecture, making it capable of producing high-quality, long-form music. It can be accessed on Huggingface, where users can interact with a demo or download the model for further use. DiffRhythm offers tools for both training and inference, and its...

1 Review

Downloads: 2 This Week

Last Update: 2025-03-06

See Project

Data Crow

The ultimate cataloguer

Data Crow allows you to use the standard movie & video (divx, xvid, DVD, Blu-ray, etc), book (and eBooks), images, board games, comic books, games & software, music (mp3 and other music files) cataloguing modules. Besides these modules, which you can change to fit your requirements, you can create new modules (want to catalogue your stamps, equipment, or anything else?). The GUI is skinnable. Reporting (using JasperReports and their community edition JasperSoft Developer Studio ), loan...

57 Reviews

Downloads: 381 This Week

Last Update: 6 days ago

See Project

Meqaris

Booking/reservation of meeting rooms/equipment with e-mail invitations

Meqaris (Meeting Equipment and Room Invitation System) is a system that allows booking meeting/conference rooms and other equipment or resources (like mobile whiteboards, projectors or conference audio/video sets) by using the same type of e-mail invitations that are used to invite participants to meetings. Especially useful in corporate environments, but can be used for anything in general, even by individual users. Simply add "resource participants" to the recipient list (just like...

Downloads: 0 This Week

Last Update: 2025-05-14

See Project

ekho

Chinese text-to-speech engine

ekho is a project with relatively sparse documentation, but from the repository it appears to be a small-scale tool for audio processing and playback, possibly with features for speech synthesis or manipulation. The repo includes scripts and configuration files suggesting interactions with media/audio handling libraries. Because of limited README detail, it seems targeted at users comfortable reading and modifying code, rather than end users expecting polished UIs. ...

Downloads: 11 This Week

Last Update: 2025-11-28

See Project

Piper

A distributed workflow engine

Piper is a multimedia-focused tool designed to simplify audio and video processing workflows through streamlined command execution. It acts as a wrapper around FFmpeg-like utilities, enabling users to build pipelines for media transformation with reduced complexity. The project emphasizes automation and reproducibility, allowing consistent handling of media tasks across environments. It supports chaining operations such as encoding, filtering, and conversion in a structured manner. ...

Downloads: 0 This Week

Last Update: 2026-04-27

See Project

Speech Signal Processing Toolkit (SPTK)

SPTK is a suite of speech signal processing tools for UNIX environments, e.g., LPC analysis, PARCOR analysis, LSP analysis, PARCOR synthesis filter, LSP synthesis filter, vector quantization techniques, and other extended versions of them.

9 Reviews

Downloads: 25 This Week

Last Update: 2023-05-10

See Project

QMDemo

Some functional modules developed by Qt on a daily basis or demos

QMDemo is an Android demonstration project that showcases multimedia playback and processing capabilities using native and Java-based components. It is designed as a learning tool for developers exploring video playback, decoding, and rendering pipelines on mobile devices. The project includes examples of handling media streams, managing buffers, and synchronizing audio and video output. It demonstrates integration with multimedia libraries and frameworks to achieve efficient playback performance. ...

Downloads: 0 This Week

Last Update: 2026-04-27

See Project

Common Resource Grep - crgrep

Common Resource Grep

CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. Here you...

3 Reviews

Downloads: 2 This Week

Last Update: 2023-04-23

See Project

Riffusion

Real-time music generation using stable diffusion techniques AI

...Riffusion (hobby) serves as the core implementation for audio and image processing, providing essential building blocks for generating music from text prompts. It includes both developer-oriented tools and user-facing components such as a command-line interface and an interactive Streamlit application for experimentation. Additionally, it can run as a Flask server to expose model inference through an API, enabling integration with other applications or services.

Downloads: 4 This Week

Last Update: 2026-03-18

See Project

Glicol

Graph-oriented live coding language and music/audio DSP library

Glicol is a graph-oriented live coding language and audio engine designed for real-time music creation and digital signal processing, written entirely in Rust. It introduces a unique paradigm where audio synthesis and sequencing are represented as interconnected nodes, allowing developers and musicians to construct complex sound pipelines through declarative code. The language is designed to be accessible to beginners while still offering powerful capabilities for advanced users, enabling both quick experimentation and precise control over audio generation. ...

Downloads: 0 This Week

Last Update: 2026-04-08

See Project

Search Results for "audio processing" - Page 3

200 projects for "audio processing" with 1 filter applied:

Dolphin

Open Vision Agents by Stream

Scanopy

YoutubeExplode

LiveKit Agents

Vidi2

Flutter Rust Bridge

Streamer-Sales

Nyquist

MATLAB Deep Learning Model Hub

WildMidi Midi Library and Player

CSM (Conversational Speech Model)

Drumstick Libraries

Snowmix

MLT Multimedia Framework

DiffRhythm

Data Crow

Meqaris

ekho

Piper

Speech Signal Processing Toolkit (SPTK)

QMDemo

Common Resource Grep - crgrep

Riffusion

Glicol

Search Results for "audio processing" - Page 3

200 projects for "audio processing" with 1 filter applied:

Dolphin

Open Vision Agents by Stream

Scanopy

YoutubeExplode

LiveKit Agents

Vidi2

Flutter Rust Bridge

Streamer-Sales

Nyquist

MATLAB Deep Learning Model Hub

WildMidi Midi Library and Player

CSM (Conversational Speech Model)

Drumstick Libraries

Snowmix

MLT Multimedia Framework

DiffRhythm

Data Crow

Meqaris

ekho

Piper

Speech Signal Processing Toolkit (SPTK)

QMDemo

Common Resource Grep - crgrep

Riffusion

Glicol

Related Searches

Related Categories