Showing 159 open source projects for "ai audio enhance"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    AudioMuse-AI

    AudioMuse-AI

    AudioMuse-AI is an Open Source Dockerized environment

    AudioMuse-AI is an open-source system designed to automatically generate playlists and analyze music libraries using artificial intelligence and audio signal processing techniques. The platform runs locally in a Dockerized environment and performs detailed sonic analysis on audio files to understand characteristics such as tempo, mood, and acoustic similarity.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 2
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    Kimi-Audio is an ambitious open-source audio foundation model designed to unify a wide array of audio processing tasks — from speech recognition and audio understanding to generative conversation and sound event classification — within a single cohesive architecture. Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    ...It integrates a latent-space audio encoder, discrete acoustic tokens, and reinforcement-learning–based training (CoT + RL) to enhance its ability to capture and reproduce voice styles, intonations, and subtle vocal cues. Moreover, Step-Audio2 supports tool-calling and retrieval-augmented generation (RAG), allowing it to access external knowledge sources or audio/text databases, thus reducing hallucinations and improving coherence in complex dialogues.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Generative AI

    Generative AI

    Sample code and notebooks for Generative AI on Google Cloud

    Generative AI is a comprehensive collection of code samples, notebooks, and demo applications designed to help developers build generative-AI workflows on the Vertex AI platform. It spans multiple modalities—text, image, audio, search (RAG/grounding) and more—showing how to integrate foundation models like the Gemini family into cloud projects.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    AI-Media2Doc

    AI-Media2Doc

    AI tool converting video/audio into structured documents instantly

    AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Paperless-AI

    Paperless-AI

    AI-powered document analysis and tagging for Paperless-ngx

    Paperless-AI is an AI-powered extension designed to enhance document management within Paperless-ngx by automating analysis, classification, and organization tasks. It continuously monitors incoming documents and processes them using various AI backends, enabling automatic assignment of titles, tags, document types, and correspondents. It integrates with multiple OpenAI-compatible services as well as local models, giving users flexibility in how document intelligence is handled. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Step-Audio

    Step-Audio

    Open-source framework for intelligent speech interaction

    Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Fun Audio Chat

    Fun Audio Chat

    Large Audio Language Model built for natural interactions

    Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    Step-Audio-EditX

    Step-Audio-EditX

    LLM-based Reinforcement Learning audio edit model

    Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level token operations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Agently

    Agently

    AI Agent Application Development Framework

    Build AI agent native application in very little code. Easy to interact with AI agents in code using structure data and chained-calls syntax. Enhance AI Agent using plugins instead of rebuilding a whole new agent. Agently is a development framework that helps developers build AI agent native applications really fast. You can use and build AI agents in your code in an extremely simple way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Qwen-Audio

    Qwen-Audio

    Chat & pretrained large audio language model proposed by Alibaba Cloud

    Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    LTX-2.3

    LTX-2.3

    Official Python inference and LoRA trainer package

    LTX-2.3 is an open-source multimodal artificial intelligence foundation model developed by Lightricks for generating synchronized video and audio from prompts or other inputs. Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while...
    Downloads: 217 This Week
    Last Update:
    See Project
  • 14
    AI YouTube Shorts Generator

    AI YouTube Shorts Generator

    A python tool that uses GPT-4, FFmpeg, and OpenCV

    AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies subtitle overlays, producing a polished short video without manual editing. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 15
    Spring AI Alibaba Examples

    Spring AI Alibaba Examples

    Spring AI Alibaba examples for building and testing AI apps

    ...Each module focuses on a specific use case such as chat, image processing, audio handling, graph workflows, and retrieval-augmented generation. The examples highlight how to integrate AI models, manage prompts, handle memory, and build multi-model or multi-agent workflows. Developers can explore individual project folders for detailed instructions and implementation guidance. Spring AI Alibaba Examples also supports experimentation through playground modules and encourages contributions to expand real-world AI use cases and improve development practices.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Anthropic Cybersecurity Skills

    Anthropic Cybersecurity Skills

    754 structured cybersecurity skills for AI agents

    Anthropic Cybersecurity Skills is a collection of structured prompts, tools, and workflows designed to enhance the cybersecurity capabilities of AI systems. It focuses on defining reusable “skills” that guide AI models in performing tasks such as vulnerability analysis, threat detection, and security auditing. The project is intended for experimentation and development of AI-assisted cybersecurity workflows, providing templates that can be adapted to different environments. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 17
    MCP Agent

    MCP Agent

    Build effective agents using Model Context Protocol

    The MCP Agent is a framework that enables the construction of effective AI agents using the Model Context Protocol. It focuses on simple, composable patterns to build production-ready AI agents, facilitating seamless integration with various tools and services to enhance AI capabilities. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    AIMr

    AIMr

    The best AI Aimbot for Fortnite, Valorant, CS2, R6, COD, Apex, & more

    AIMr is an advanced AI aimbot designed to enhance gameplay by providing automated aiming assistance for games like Fortnite, Valorant, CS2, R6, COD, Apex, and more. Written in Python, it uses cutting-edge AI technologies to ensure undetected, efficient aimbot functionality with customizable features. The software includes various aiming enhancements, such as recoil control, silent aim, and prediction capabilities, aimed at making gameplay smoother and more competitive. ...
    Downloads: 312 This Week
    Last Update:
    See Project
  • 19
    Edit Banana

    Edit Banana

    Edit Banana: A framework for converting statistical figures

    Edit Banana is an innovative web application designed to simplify image editing by merging intuitive user interfaces with powerful generative AI capabilities, enabling users to quickly enhance, manipulate, or transform photos without needing advanced design skills. It provides a smooth, browser-based experience where users can upload images, make precise edits such as background removal or inpainting, and apply stylistic transformations or corrections through AI prompts. The tool focuses on accessibility, giving hobbyists, content creators, and small teams a way to produce polished visuals without downloading heavyweight software or managing local compute resources. ...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 20
    memsearch

    memsearch

    A Markdown-first memory system, a standalone library for any AI agent

    ...It integrates with vector databases like Milvus, enabling scalable storage and retrieval of large datasets. Memsearch is designed to be agent-friendly, making it easy to plug into existing AI workflows and enhance reasoning capabilities. Its markdown-first approach ensures transparency and portability of stored knowledge. Overall, it provides a robust foundation for building AI systems with persistent and intelligent memory.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    DeepSeek-V3

    DeepSeek-V3

    Powerful AI language model (MoE) optimized for efficiency/performance

    DeepSeek-V3 is a robust Mixture-of-Experts (MoE) language model developed by DeepSeek, featuring a total of 671 billion parameters, with 37 billion activated per token. It employs Multi-head Latent Attention (MLA) and the DeepSeekMoE architecture to enhance computational efficiency. The model introduces an auxiliary-loss-free load balancing strategy and a multi-token prediction training objective to boost performance. Trained on 14.8 trillion diverse, high-quality tokens, DeepSeek-V3...
    Downloads: 109 This Week
    Last Update:
    See Project
  • 22
    Archon

    Archon

    The knowledge and task management backbone for AI coding assistants

    Archon is an open-source “command center” designed to enhance AI coding assistant workflows by giving developers a centralized environment for knowledge management, context engineering, and task coordination across AI agents. It acts as a backend (including an MCP server) that allows different AI coding tools and assistants to share the same structured context, knowledge base, and task lists, improving consistency, productivity, and collaboration across multi-agent interactions. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    RA.Aid

    RA.Aid

    Develop software autonomously

    RA.Aid is an AI-powered assistant designed to enhance the efficiency of software development workflows. It integrates seamlessly with various development environments, providing intelligent code suggestions, automated documentation generation, and real-time error detection. By leveraging advanced machine learning models, RA.Aid aims to reduce development time and improve code quality.​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Claude Code Skills & Plugins

    Claude Code Skills & Plugins

    232+ Claude Code skills & agent plugins for Claude Code, Codex

    Claude Skills is a repository that provides a collection of structured skill definitions designed to enhance the capabilities of Claude-based AI systems. Each skill encapsulates a specific capability, such as coding, analysis, or workflow execution, allowing the model to perform tasks more effectively. The project emphasizes modularity, enabling skills to be combined and reused across different contexts. It is designed to integrate seamlessly into AI workflows, providing a plug-and-play approach to extending functionality. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB