Search Results for "audio processing" - Page 4

Showing 374 open source projects for "audio processing"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    OpenAI .NET

    OpenAI .NET

    The official .NET library for the OpenAI API

    OpenAI .NET is the official client library for calling the OpenAI REST API from C# and other .NET languages, with first-class support for modern .NET patterns. It provides strongly typed clients across API areas (chat, audio, images, embeddings, moderations, batches, files, models, vector stores, responses, realtime, assistants) and works with .NET Standard 2.0 while the examples use .NET 8. You install it via NuGet and authenticate with an API key, ideally through environment variables or...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    video-use

    video-use

    Edit videos with Claude Code

    ...Designed to work with Claude Code, it automates the entire editing process—from cutting clips to rendering the final output—without requiring manual timelines or complex software interfaces. The system intelligently analyzes audio transcripts and visual cues to make precise, context-aware editing decisions. It supports a wide range of content types, including interviews, tutorials, montages, and talking-head videos. By combining structured text representations with on-demand visual previews, it minimizes processing overhead while maintaining high-quality results. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 3
    fooyin

    fooyin

    A customisable music player

    ...It provides a modular interface that can be built from scratch or adapted from preset layouts, allowing users to tailor the experience to their workflow. The player supports a wide range of audio formats and includes advanced playback features such as gapless playback, ReplayGain, and DSP processing. It integrates a powerful plugin system that enables extensions for widgets, decoders, metadata handling, and external services. fooyin also includes a scripting language called FooScript, which allows users to customize interface behavior, automate playlists, and control display logic. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    Instill Core

    Instill Core

    Instill Core is a full-stack AI infrastructure tool for data

    Instill Core is an open-source, full-stack AI infrastructure platform designed to orchestrate data pipelines, machine learning models, and unstructured data processing into a unified, production-ready system. It provides an end-to-end solution that enables developers to build, deploy, and manage AI-powered applications without needing to manually stitch together multiple tools across the data and model lifecycle. The platform focuses heavily on handling unstructured data such as documents, images, audio, and video, transforming them into AI-ready formats through integrated ETL pipelines and processing workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 5
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    ...It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research preview, and includes data-processing scripts so users can train or finetune their own variants. Inference is provided through a Python package that uses vLLM under the hood for high-throughput, low-latency generation, including streaming examples that show how to generate audio chunks in real time. The maintainers provide Colab notebooks, a standardized prompting format, and one-click deployment via Baseten for production-grade, FP8/FP16 optimized inference with ~200 ms streaming latency.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    OpenAI

    OpenAI

    Swift community driven package for OpenAI public API

    ...It simplifies the integration of AI capabilities into iOS, macOS, and other Swift-based applications by offering a clean abstraction over the underlying REST API, enabling developers to focus on functionality rather than low-level implementation details. The SDK supports a wide range of features including chat completions, embeddings, image generation, audio processing, and structured outputs, making it a comprehensive toolkit for building AI-powered applications. It also includes support for advanced features such as function calling, assistants, and tool integration through protocols like Model Context Protocol, enabling more complex and interactive AI workflows.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    stt

    stt

    Voice Recognition to Text Tool

    ...The project is designed to be easy to deploy: you can run a local Python server that exposes an HTTP API for uploading audio/video files and retrieving transcriptions in different formats. It supports GPU acceleration if available, enabling faster processing on compatible hardware but still offers reliable performance on CPUs alone.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Suno AI API

    Suno AI API

    Use API to call the music generation AI of suno.ai

    Suno API is an unofficial open-source interface that enables developers to programmatically interact with Suno’s AI music generation platform, allowing automated creation of songs, lyrics, and audio content through API calls. It replicates the behavior of Suno’s web-based creation tools by reverse engineering internal endpoints and exposing them through a developer-friendly interface built with Python and FastAPI. The system supports asynchronous processing, enabling efficient handling of multiple generation requests and making it suitable for scalable applications and automation pipelines. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    learn-ffmpeg

    learn-ffmpeg

    Learn audio and video knowledge, organize materials

    learn-ffmpeg is an educational repository that provides a structured collection of tutorials, notes, and examples for mastering FFmpeg and multimedia processing concepts. It covers a wide range of topics, including encoding, decoding, transcoding, and streaming workflows. The repository is designed to guide users from basic command usage to advanced media pipeline development. It includes practical examples that demonstrate how to apply FFmpeg in real-world scenarios. The content is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Agili Hacker Podcast

    Agili Hacker Podcast

    AI tool that turns Hacker News posts into daily podcast updates

    Hacker Podcast is an AI-powered project that turns top Hacker News stories into a Chinese podcast. It automatically fetches trending posts each day, processes the content with AI, and generates concise summaries before converting them into audio. This creates a hands-free way to stay updated on tech, startups, and developer discussions without reading long threads. Hacker Podcast combines content aggregation, natural language processing, and text-to-speech to deliver clear and digestible updates. Users can listen through web interfaces or podcast platforms, while also accessing written summaries for deeper reading. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 12
    AnalysisAVP

    AnalysisAVP

    Encode decode, rgb yuv h264 aac flv mp4 rtmp

    AnalysisAVP is a comprehensive educational repository focused on audio and video technology concepts, providing structured knowledge across multimedia systems and processing pipelines. It covers foundational topics such as encoding, decoding, color formats like RGB and YUV, and widely used codecs including H.264 and AAC. The project also explores media container formats like MP4 and FLV, along with streaming protocols such as RTMP and WebRTC, offering a broad understanding of media transmission. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Markdownify MCP Server

    Markdownify MCP Server

    Convert files and web content into clean, usable Markdown easily

    ...It also allows retrieval of existing Markdown files, making it useful for documentation, research, and AI-assisted workflows. By standardizing content into Markdown, it helps unify inputs across different sources for better processing and integration with AI tools and developer environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    TADA

    TADA

    Open Source Speech Language Model

    TADA is an open-source speech-language modeling framework designed to unify spoken audio and text representations within a single generative architecture. The system focuses on aligning speech and text streams using a dual-alignment mechanism that synchronizes the acoustic signal with its textual representation. By modeling both modalities together, the framework allows developers to build systems capable of generating, understanding, and transforming speech and language simultaneously. This...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Note67

    Note67

    A private, local meeting notes assistant

    note67 is a private, local meeting notes assistant application that combines audio capture, transcription, and AI-powered summarization to help users document conversations and meetings on their own devices without relying on cloud services. Built with a cross-platform architecture using Rust (via Tauri) for backend logic and a TypeScript/React frontend, it prioritizes privacy by performing audio transcription locally with Whisper models and generating summaries with locally-hosted AI, eliminating the need to send sensitive meeting content to external servers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    NanoBoyAdvance

    NanoBoyAdvance

    A cycle-accurate Nintendo Game Boy Advance emulator

    NanoBoyAdvance is a cycle-accurate Game Boy Advance emulator that prioritizes precision and correctness in replicating original hardware behavior. It is designed to emulate the GBA at a very low level, including CPU timing, DMA operations, graphics processing, and memory behavior, ensuring that even edge cases and obscure hardware quirks are faithfully reproduced. The emulator achieves extremely high compatibility, passing multiple hardware test suites and accurately running games that rely on precise timing conditions. In addition to accuracy, it introduces enhancements such as a high-quality audio mixer that improves sound output without altering internal emulation behavior. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    SoX is the Swiss Army Knife of sound processing utilities. It can convert audio files to other popular audio file types and also apply sound effects and filters during the conversion.
    Leader badge
    Downloads: 19,821 This Week
    Last Update:
    See Project
  • 18
    Live API Web Console

    Live API Web Console

    A react-based starter app for using the Live API over websockets

    ...It ships with demo branches that show grounded search, function calling, and visualization—one example has the model calling a function that renders Vega/Altair graphs directly in the UI. Under the hood there’s an event-emitting WebSocket client, an audio in/out processing layer, and a minimal scaffolded view so you can focus on your app logic rather than wiring.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    txtai

    txtai

    Build AI-powered semantic search applications

    txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Pipecat

    Pipecat

    Framework for building real-time voice and multimodal AI agents

    Pipecat is an open source Python framework designed for building real-time voice and multimodal conversational AI agents. It provides developers with tools to orchestrate complex pipelines that combine speech recognition, language models, audio processing, and speech synthesis into a cohesive conversational system. Pipecat focuses on low-latency interactions so voice conversations with AI feel natural and responsive during live use. Pipecat allows applications to integrate multiple AI services and transports, enabling flexible deployment across different environments and communication channels. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    FFmpegCommand

    FFmpegCommand

    Command library suitable for Android. It implements audio and video

    FFmpegCommand is a graphical utility designed to simplify the generation and execution of FFmpeg commands for multimedia processing tasks. It provides an interface where users can configure parameters such as codecs, bitrates, and formats without manually writing command-line instructions. The tool dynamically builds FFmpeg commands based on user selections, making complex workflows more accessible. It supports common operations such as transcoding, trimming, and format conversion....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    BotSharp

    BotSharp

    AI Multi-Agent Framework in .NET

    Conversation as a platform (CaaP) is the future, so it's perfect that we're already offering the whole toolkits to our .NET developers using the BotSharp AI BOT Platform Builder to build a CaaP. It opens up as much learning power as possible for your own robots and precisely control every step of the AI processing pipeline. BotSharp is an open source machine learning framework for AI Bot platform builder. This project involves natural language understanding, computer vision and audio processing technologies, and aims to promote the development and application of intelligent robot assistants in information systems. Out-of-the-box machine learning algorithms allow ordinary programmers to develop artificial intelligence applications faster and easier. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    MediaPipe Solutions

    MediaPipe Solutions

    Cross-platform, customizable ML solutions

    MediaPipe is an open-source framework developed by Google for building cross-platform machine learning pipelines that process audio, video, and other streaming data in real time. The system provides developers with tools and reusable components that allow them to combine multiple machine learning models with preprocessing and postprocessing logic into efficient perception pipelines. These pipelines can run on a wide variety of platforms including mobile devices, desktop systems, web...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    clone-voice

    clone-voice

    A sound cloning tool with a web interface, using your voice

    Clone-voice is a local voice-cloning tool that lets you synthesize speech in any target voice or convert one recording into another voice using the same timbre. It is built around Coqui’s XTTS-v2 model, so it inherits multilingual support and modern neural TTS quality while wrapping it in a user-friendly desktop workflow. The app is designed to be very easy to use: you download a precompiled package, double-click app.exe, and it launches a browser-based web interface where you control...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Spring AI Alibaba Examples

    Spring AI Alibaba Examples

    Spring AI Alibaba examples for building and testing AI apps

    ...It is designed to help developers understand core concepts, explore practical implementations, and follow best practices when building AI-powered systems using the Spring ecosystem. Each module focuses on a specific use case such as chat, image processing, audio handling, graph workflows, and retrieval-augmented generation. The examples highlight how to integrate AI models, manage prompts, handle memory, and build multi-model or multi-agent workflows. Developers can explore individual project folders for detailed instructions and implementation guidance. Spring AI Alibaba Examples also supports experimentation through playground modules and encourages contributions to expand real-world AI use cases and improve development practices.
    Downloads: 2 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB