ai audio enhance free download

Showing 159 open source projects for "ai audio enhance"

View related business solutions

Python Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

AudioMuse-AI

AudioMuse-AI is an Open Source Dockerized environment

AudioMuse-AI is an open-source system designed to automatically generate playlists and analyze music libraries using artificial intelligence and audio signal processing techniques. The platform runs locally in a Dockerized environment and performs detailed sonic analysis on audio files to understand characteristics such as tempo, mood, and acoustic similarity.

Downloads: 12 This Week

Last Update: 7 days ago
See Project
2

Kimi-Audio

Audio foundation model excelling in audio understanding

Kimi-Audio is an ambitious open-source audio foundation model designed to unify a wide array of audio processing tasks — from speech recognition and audio understanding to generative conversation and sound event classification — within a single cohesive architecture. Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one...

Downloads: 3 This Week

Last Update: 2026-01-27
See Project
3

Step-Audio 2

Multi-modal large language model designed for audio understanding

...It integrates a latent-space audio encoder, discrete acoustic tokens, and reinforcement-learning–based training (CoT + RL) to enhance its ability to capture and reproduce voice styles, intonations, and subtle vocal cues. Moreover, Step-Audio2 supports tool-calling and retrieval-augmented generation (RAG), allowing it to access external knowledge sources or audio/text databases, thus reducing hallucinations and improving coherence in complex dialogues.

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
4

Generative AI

Sample code and notebooks for Generative AI on Google Cloud

Generative AI is a comprehensive collection of code samples, notebooks, and demo applications designed to help developers build generative-AI workflows on the Vertex AI platform. It spans multiple modalities—text, image, audio, search (RAG/grounding) and more—showing how to integrate foundation models like the Gemini family into cloud projects.

Downloads: 2 This Week

Last Update: 3 days ago
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

AI-Media2Doc

AI tool converting video/audio into structured documents instantly

AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
6

Paperless-AI

AI-powered document analysis and tagging for Paperless-ngx

Paperless-AI is an AI-powered extension designed to enhance document management within Paperless-ngx by automating analysis, classification, and organization tasks. It continuously monitors incoming documents and processes them using various AI backends, enabling automatic assignment of titles, tags, document types, and correspondents. It integrates with multiple OpenAI-compatible services as well as local models, giving users flexibility in how document intelligence is handled. ...

Downloads: 1 This Week

Last Update: 2026-03-17
See Project
7

Qwen2-Audio

Repo of Qwen2-Audio chat & pretrained large audio language model

Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound...

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
8

Step-Audio

Open-source framework for intelligent speech interaction

Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and...

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
9

Fun Audio Chat

Large Audio Language Model built for natural interactions

Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. ...

Downloads: 0 This Week

Last Update: 2026-02-27
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

Step-Audio-EditX

LLM-based Reinforcement Learning audio edit model

Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level token operations. ...

Downloads: 0 This Week

Last Update: 2026-04-09
See Project
11

Agently

AI Agent Application Development Framework

Build AI agent native application in very little code. Easy to interact with AI agents in code using structure data and chained-calls syntax. Enhance AI Agent using plugins instead of rebuilding a whole new agent. Agently is a development framework that helps developers build AI agent native applications really fast. You can use and build AI agents in your code in an extremely simple way.

Downloads: 0 This Week

Last Update: 2026-03-28
See Project
12

Qwen-Audio

Chat & pretrained large audio language model proposed by Alibaba Cloud

Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance...

Downloads: 1 This Week

Last Update: 2025-09-23
See Project
13

LTX-2.3

Official Python inference and LoRA trainer package

LTX-2.3 is an open-source multimodal artificial intelligence foundation model developed by Lightricks for generating synchronized video and audio from prompts or other inputs. Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while...

Downloads: 217 This Week

Last Update: 2026-04-23
See Project
14

AI YouTube Shorts Generator

A python tool that uses GPT-4, FFmpeg, and OpenCV

AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies subtitle overlays, producing a polished short video without manual editing. ...

Downloads: 13 This Week

Last Update: 3 days ago
See Project
15

Spring AI Alibaba Examples

Spring AI Alibaba examples for building and testing AI apps

...Each module focuses on a specific use case such as chat, image processing, audio handling, graph workflows, and retrieval-augmented generation. The examples highlight how to integrate AI models, manage prompts, handle memory, and build multi-model or multi-agent workflows. Developers can explore individual project folders for detailed instructions and implementation guidance. Spring AI Alibaba Examples also supports experimentation through playground modules and encourages contributions to expand real-world AI use cases and improve development practices.

1 Review

Downloads: 2 This Week

Last Update: 2 days ago
See Project
16

Anthropic Cybersecurity Skills

754 structured cybersecurity skills for AI agents

Anthropic Cybersecurity Skills is a collection of structured prompts, tools, and workflows designed to enhance the cybersecurity capabilities of AI systems. It focuses on defining reusable “skills” that guide AI models in performing tasks such as vulnerability analysis, threat detection, and security auditing. The project is intended for experimentation and development of AI-assisted cybersecurity workflows, providing templates that can be adapted to different environments. ...

Downloads: 9 This Week

Last Update: 2026-04-22
See Project
17

MCP Agent

Build effective agents using Model Context Protocol

The MCP Agent is a framework that enables the construction of effective AI agents using the Model Context Protocol. It focuses on simple, composable patterns to build production-ready AI agents, facilitating seamless integration with various tools and services to enhance AI capabilities.

Downloads: 0 This Week

Last Update: 2025-05-09
See Project
18

AIMr

The best AI Aimbot for Fortnite, Valorant, CS2, R6, COD, Apex, & more

AIMr is an advanced AI aimbot designed to enhance gameplay by providing automated aiming assistance for games like Fortnite, Valorant, CS2, R6, COD, Apex, and more. Written in Python, it uses cutting-edge AI technologies to ensure undetected, efficient aimbot functionality with customizable features. The software includes various aiming enhancements, such as recoil control, silent aim, and prediction capabilities, aimed at making gameplay smoother and more competitive. ...

1 Review

Downloads: 312 This Week

Last Update: 2025-08-31
See Project
19

Edit Banana

Edit Banana: A framework for converting statistical figures

Edit Banana is an innovative web application designed to simplify image editing by merging intuitive user interfaces with powerful generative AI capabilities, enabling users to quickly enhance, manipulate, or transform photos without needing advanced design skills. It provides a smooth, browser-based experience where users can upload images, make precise edits such as background removal or inpainting, and apply stylistic transformations or corrections through AI prompts. The tool focuses on accessibility, giving hobbyists, content creators, and small teams a way to produce polished visuals without downloading heavyweight software or managing local compute resources. ...

Downloads: 29 This Week

Last Update: 3 days ago
See Project
20

memsearch

A Markdown-first memory system, a standalone library for any AI agent

...It integrates with vector databases like Milvus, enabling scalable storage and retrieval of large datasets. Memsearch is designed to be agent-friendly, making it easy to plug into existing AI workflows and enhance reasoning capabilities. Its markdown-first approach ensures transparency and portability of stored knowledge. Overall, it provides a robust foundation for building AI systems with persistent and intelligent memory.

Downloads: 5 This Week

Last Update: 3 days ago
See Project
21

DeepSeek-V3

Powerful AI language model (MoE) optimized for efficiency/performance

DeepSeek-V3 is a robust Mixture-of-Experts (MoE) language model developed by DeepSeek, featuring a total of 671 billion parameters, with 37 billion activated per token. It employs Multi-head Latent Attention (MLA) and the DeepSeekMoE architecture to enhance computational efficiency. The model introduces an auxiliary-loss-free load balancing strategy and a multi-token prediction training objective to boost performance. Trained on 14.8 trillion diverse, high-quality tokens, DeepSeek-V3...

1 Review

Downloads: 109 This Week

Last Update: 2025-07-09
See Project
22

Archon

The knowledge and task management backbone for AI coding assistants

Archon is an open-source “command center” designed to enhance AI coding assistant workflows by giving developers a centralized environment for knowledge management, context engineering, and task coordination across AI agents. It acts as a backend (including an MCP server) that allows different AI coding tools and assistants to share the same structured context, knowledge base, and task lists, improving consistency, productivity, and collaboration across multi-agent interactions. ...

Downloads: 3 This Week

Last Update: 4 days ago
See Project
23

HunyuanVideo-Foley

Multimodal Diffusion with Representation Alignment

HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional...

Downloads: 2 This Week

Last Update: 2025-09-28
See Project
24

RA.Aid

Develop software autonomously

RA.Aid is an AI-powered assistant designed to enhance the efficiency of software development workflows. It integrates seamlessly with various development environments, providing intelligent code suggestions, automated documentation generation, and real-time error detection. By leveraging advanced machine learning models, RA.Aid aims to reduce development time and improve code quality.

Downloads: 1 This Week

Last Update: 2025-05-07
See Project
25

Claude Code Skills & Plugins

232+ Claude Code skills & agent plugins for Claude Code, Codex

Claude Skills is a repository that provides a collection of structured skill definitions designed to enhance the capabilities of Claude-based AI systems. Each skill encapsulates a specific capability, such as coding, analysis, or workflow execution, allowing the model to perform tasks more effectively. The project emphasizes modularity, enabling skills to be combined and reused across different contexts. It is designed to integrate seamlessly into AI workflows, providing a plug-and-play approach to extending functionality. ...

Downloads: 3 This Week

Last Update: 2026-04-23
See Project