audio analysis free download

Showing 41 open source projects for "audio analysis"

View related business solutions

Artificial Intelligence Mac Clear Filters & Widen Search

AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
1

Qwen2-Audio

Repo of Qwen2-Audio chat & pretrained large audio language model

Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models.

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
2

pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification

pyAudioAnalysis is an open-source Python library designed for audio signal analysis, machine learning, and music information retrieval tasks. The project provides a collection of tools that allow developers to extract meaningful features from audio files and use those features for classification, segmentation, and analysis. The library supports multiple audio processing workflows, including feature extraction from raw audio signals, training of machine learning models, and automatic audio segmentation. ...

Downloads: 1 This Week

Last Update: 2026-03-10
See Project
3

Qwen-Audio

Chat & pretrained large audio language model proposed by Alibaba Cloud

Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance...

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
4

AudioMuse-AI

AudioMuse-AI is an Open Source Dockerized environment

...AudioMuse-AI integrates with several popular self-hosted music servers including Jellyfin, Navidrome, and Emby, allowing users to extend existing media servers with advanced AI-powered recommendation capabilities. The system uses machine learning and audio analysis tools such as Librosa and ONNX models to extract features directly from audio tracks.

Downloads: 10 This Week

Last Update: 1 day ago
See Project
8 Monitoring Tools in One APM. Install in 5 Minutes.
Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.

Start Free
5

NeuralNote

Audio Plugin for Audio to MIDI transcription using deep learning

NeuralNote is an open-source audio software tool designed to convert recorded audio into MIDI data using modern machine learning techniques. The software functions as an audio plugin that can be used inside digital audio workstations as well as a standalone application for music production and analysis. Its main purpose is to perform audio-to-MIDI transcription, allowing musicians to record a performance and automatically transform it into editable MIDI notes. ...

Downloads: 91 This Week

Last Update: 2026-03-12
See Project
6

audioFlux

A library for audio and music analysis, feature extraction

A library for audio and music analysis, and feature extraction. Can be used for deep learning, pattern recognition, signal processing, bioinformatics, statistics, finance, etc. audioflux is a deep learning tool library for audio and music analysis, feature extraction. It supports dozens of time-frequency analysis transformation methods and hundreds of corresponding time-domain and frequency-domain feature combinations.

Downloads: 0 This Week

Last Update: 2024-08-09
See Project
7

Ultravox

Fast multimodal LLM for real-time voice interaction and AI apps

...Ultravox is optimized for low latency, achieving fast response times suitable for interactive voice agents and real-time applications. It supports use cases such as conversational AI agents, speech-to-speech translation, and analysis of spoken audio content. Ultravox also includes tooling and configuration systems for training, evaluation, and dataset integration.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
8

MediaPipe Solutions

Cross-platform, customizable ML solutions

...MediaPipe is widely used in computer vision and multimedia applications such as hand tracking, face detection, pose estimation, object recognition, and gesture analysis. The framework includes prebuilt solutions that developers can quickly integrate into applications as well as lower-level APIs that allow custom pipeline construction.

Downloads: 1 This Week

Last Update: 2026-04-23
See Project
9

SALMONN family

A suite of advanced multi-modal LLMs

SALMONN is a family of advanced multi-modal large language models (LLMs) developed by ByteDance — designed to handle and integrate multiple data modalities (e.g. text, audio, video) rather than just plain text. The repository bundles different branches targeting specialized tasks (e.g. video-SALMONN, speech-quality assessment, general multimodal tasks), suggesting that the project is modular and extensible across domains. SALMONN aims to push the frontier of multi-modal AI by allowing models...

Downloads: 0 This Week

Last Update: 2026-04-20
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

Docling

Get your documents ready for gen AI

...The project focuses on converting and parsing many document formats into a unified structured representation that downstream systems can easily consume. It supports advanced PDF understanding, including layout detection, table extraction, and reading order analysis, enabling high-fidelity document intelligence pipelines. Docling is designed to run efficiently on commodity hardware and can be used both as a Python API and a command-line tool. Its modular architecture allows developers to extend functionality and integrate specialized models for tasks such as OCR and audio transcription. Overall, Docling serves as a comprehensive preprocessing layer for AI applications that require reliable, structured access to complex document data.

Downloads: 2 This Week

Last Update: 2 days ago
See Project
11

Vidi2

Large Multimodal Models for Video Understanding and Editing

...Vidi targets applications like intelligent video editing, automated video search, content analysis, and editing assistance, enabling users to efficiently locate relevant segments and objects in hours-long footage. The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.

Downloads: 1 This Week

Last Update: 2026-03-04
See Project
12

h2oGPT

Private chat with local GPT with document, images, video, etc.

h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and Mixtral, making it a flexible tool for anyone needing advanced document analysis and AI-driven conversation in a secure, local setup.

Downloads: 2 This Week

Last Update: 2025-02-22
See Project
13

MiniCPM-o

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...

Downloads: 0 This Week

Last Update: 2025-05-15
See Project
14

Amphion

Toolkit for audio, music, and speech generation

Amphion is a toolkit from OpenMMLab dedicated to audio, music, and speech generation, aimed at both reproducible research and helping newcomers get started in generative audio. It provides standardized implementations and recipes for classic and state-of-the-art generative models in audio, including TTS, music generation, and voice conversion. A distinctive feature of Amphion is its emphasis on visualization: it offers interactive visualizations of model architectures and generation...

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
15

AI File Sorter

Local AI file organization with categorization and rename suggestions

...It can also analyze document text to improve categorization and renaming. Supported formats include PDF, DOCX, XLSX, PPTX, ODT, ODS, ODP, and common text files. For supported audio and video files, AI File Sorter can read embedded metadata (such as ID3, Vorbis, and MP4 tags) to suggest normalized names like year_artist_album_title.ext. AI analysis runs read-only, and all suggestions must be reviewed before being applied. AI File Sorter can run fully offline using local models like Mistral or LLaMA, so files and metadata stay on your device unless you configure a remote endpoint.

Downloads: 499 This Week

Last Update: 2026-04-07
See Project
16

SPPAS

SPPAS - the automatic annotation and analyses of speech

SPPAS is a scientific computer software package written and maintained by Brigitte Bigi of the Laboratoire Parole et Langage, in Aix-en-Provence, France. Available for free, with open source code, there is simply no other package for linguists to simple use in the automatic annotations of speech, the analyses of any kind of annotated data and the conversion of annotated files. SPPAS is able to produce automatically speech annotations from a recorded speech sound and its orthographic...

Downloads: 12 This Week

Last Update: 2026-04-06
See Project
17

Qwen Chat

An AI assistant for everyone, powered by the Qwen series models

Qwen Chat is a versatile AI assistant powered by the advanced Qwen series models, designed for creativity, collaboration, and problem-solving. It excels at deep reasoning and cognitive tasks, helping users solve complex problems in math, science, coding, and more. The AI supports creative writing by generating narratives, characters, and plot ideas, blending imagination with logical coherence. Qwen Chat’s web search feature delivers fast, accurate, and real-time answers sourced from...

1 Review

Downloads: 54 This Week

Last Update: 2025-07-14
See Project
18

AudioMuse-AI

AudioMuse-AI is an open-source, Dockerized environment that brings automatic playlist generation to your self-hosted music library. Using tools such as Librosa and ONNX, it performs sonic analysis on your audio files locally, allowing you to curate playlists for any mood or occasion without relying on external APIs. Deploy it easily on your local machine with Docker Compose or Podman, or scale it in a Kubernetes cluster (supports AMD64 and ARM64). It integrates with the main music servers' APIs such as Jellyfin, Navidrome, LMS, Lyrion, and Emby. ...

Downloads: 0 This Week

Last Update: 2026-02-01
See Project
19

NodeTool

Visual AI Workflow Builder

NodeTool is an open‑source, visual AI workflow builder that lets you connect nodes for text, images, audio, video, data, and automation—then run them locally or on the cloud. Build multi‑step agents, RAG systems, and creative media pipelines without coding, inspect execution in real time, and deploy anywhere: home server, private VPC, RunPod, or Cloud Run. With a local‑first design, NodeTool keeps models and data under your control while still supporting providers like OpenAI, Anthropic,...

Downloads: 2 This Week

Last Update: 2026-01-20
See Project
20

vocal-separate

An extremely simple tool for separating vocals and background music

vocal-separate is a simple but effective audio processing application that isolates vocals and instrumental tracks from music and video files using stem-based source separation models, enabling tasks such as karaoke creation, remixing, and music analysis. Built as a localized web-based tool, it runs entirely on the user’s machine without requiring an internet connection, emphasizing privacy and convenience for creative work.

Downloads: 3 This Week

Last Update: 2026-02-17
See Project
21

sourcesinc

Source code from the Research Institute for Signals, Systems and Computational Intelligence http://fich.unl.edu.ar/sinc

Downloads: 8 This Week

Last Update: 2023-12-05
See Project
22

Cheetah

AI macOS app for real-time coding interview coaching assistance

Cheetah is an AI-powered macOS application designed to assist users during software engineering interview practice through real-time coaching capabilities. It integrates audio transcription and AI-generated responses to help users navigate technical interview questions as they happen. Cheetah uses a local speech-to-text engine based on Whisper to capture and transcribe conversations in real time, enabling it to understand interviewer prompts. It then leverages language models to generate...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
23

audioFlux

A library for audio and music analysis, feature extraction.

audioflux is a deep learning tool library for audio and music analysis, feature extraction. It supports dozens of time-frequency analysis transformation methods and hundreds of corresponding time-domain and frequency-domain feature combinations. It can be provided to deep learning networks for training, and is used to study various tasks in the audio field such as Classification, Separation, Music Information Retrieval(MIR) and ASR etc.

Downloads: 0 This Week

Last Update: 2023-03-22
See Project
24

Piano transcription

Task of transcribing piano recordings into MIDI files

...By using this transcription tool, users can transform live performance audio (or recordings) into editable, machine-readable MIDI — enabling tasks such as analysis, editing, remixing, or generation of piano music. The authors used this system to build a large-scale classical piano MIDI dataset (see next project), but as a standalone tool it enables researchers, musicians, or hobbyists to transcribe their own piano recordings automatically.

Downloads: 3 This Week

Last Update: 2025-12-02
See Project
25

VoiceFixer

General Speech Restoration

VoiceFixer is a machine-learning framework for “speech restoration”: given a degraded or distorted audio recording — with noise, clipping, low sampling rate, reverberation, or other artifacts — it attempts to recover high-fidelity, clean speech. The architecture works in two stages: first an analysis stage that tries to extract “clean” intermediate features from the noisy audio (e.g. removing noise, denoising, dereverberation, upsampling), and then a neural vocoder-based synthesis stage that reconstructs a high-quality waveform from those features. ...

Downloads: 8 This Week

Last Update: 2025-11-28
See Project