Search Results for "audio processing" - Page 2

200 projects for "audio processing" with 1 filter applied:

  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    Scriberr

    Scriberr

    Self-hosted AI audio transcription

    Scriberr is a self-hosted AI-powered transcription platform designed to convert audio and video into highly accurate text while prioritizing privacy and local processing. Unlike cloud-based transcription services, Scriberr runs entirely on the user’s machine, ensuring that sensitive recordings are never sent to third-party servers and remain fully under user control. It leverages modern speech recognition models such as Whisper and other advanced architectures to deliver precise transcripts with word-level timing and speaker identification. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    VERT.sh

    VERT.sh

    The next-generation file converter

    VERT is a modern, privacy-focused file conversion platform that leverages WebAssembly to perform conversions entirely on the user’s device rather than relying on cloud-based processing. Built with Svelte and TypeScript, it provides a clean and responsive interface for converting a wide variety of file types, including images, audio, video, and documents. One of its defining characteristics is its local-first approach, which eliminates the need to upload files to external servers, thereby improving both privacy and performance. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    PyAV

    PyAV

    Pythonic bindings for FFmpeg's libraries

    ...While powerful, it requires a solid understanding of FFmpeg concepts, as it prioritizes flexibility and control over abstraction. Overall, PyAV is a robust tool for developers building advanced video and audio processing systems in Python.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    ffmpeg-commander

    ffmpeg-commander

    A web-based GUI for quickly generating common FFmpeg command-line

    ...The interface is inspired by tools like HandBrake, aiming to lower the barrier to entry for FFmpeg usage. Overall, it acts as a bridge between ease of use and powerful multimedia processing capabilities.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Ultravox

    Ultravox

    Fast multimodal LLM for real-time voice interaction and AI apps

    Ultravox is an open source multimodal large language model designed specifically for real-time voice-based interactions. It is built to process both text and spoken audio directly, eliminating the need for a separate speech recognition stage and enabling more seamless conversational experiences. Ultravox works by combining text prompts with encoded audio inputs, allowing it to understand spoken language alongside written instructions in a unified pipeline. Internally, it leverages pretrained...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    VideoCaptioner

    VideoCaptioner

    AI-powered tool for generating, optimizing, and translating subtitles

    VideoCaptioner is an open source AI-powered subtitle processing tool designed to simplify the workflow of creating subtitles for videos. It integrates speech recognition, language processing, and translation technologies to automatically generate and refine subtitles from video or audio sources. VideoCaptioner uses speech-to-text engines such as Whisper variants to transcribe spoken content and convert it into subtitle text with accurate timestamps.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 7
    FFmate

    FFmate

    FFmate is a modern and powerful automation layer

    FFmate is a graphical utility designed to simplify the use of FFmpeg by providing an intuitive interface for building and executing multimedia processing commands. It allows users to perform tasks such as transcoding, trimming, and format conversion without needing to memorize command-line syntax. The tool dynamically generates FFmpeg commands based on user input, making complex workflows more accessible. It supports a wide range of audio and video formats, enabling flexible media processing. ffmate is designed for both beginners and advanced users, offering a balance between simplicity and customization. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    SALMONN family

    SALMONN family

    A suite of advanced multi-modal LLMs

    SALMONN is a family of advanced multi-modal large language models (LLMs) developed by ByteDance — designed to handle and integrate multiple data modalities (e.g. text, audio, video) rather than just plain text. The repository bundles different branches targeting specialized tasks (e.g. video-SALMONN, speech-quality assessment, general multimodal tasks), suggesting that the project is modular and extensible across domains. SALMONN aims to push the frontier of multi-modal AI by allowing models...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    FastRTC

    FastRTC

    The python library for real-time communication

    FastRTC is a Python library designed to simplify real-time communication (RTC), especially for audio and video streaming applications. It abstracts away much of the complexity that typically comes with implementing WebRTC by providing a simple interface — e.g. a Stream class — that can be mounted within a web backend (for example a FastAPI application). This makes it particularly well suited for building real-time voice (or video) interfaces for applications such as AI assistants, live chat, or collaborative audio/video tools. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    ...It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research preview, and includes data-processing scripts so users can train or finetune their own variants. Inference is provided through a Python package that uses vLLM under the hood for high-throughput, low-latency generation, including streaming examples that show how to generate audio chunks in real time. The maintainers provide Colab notebooks, a standardized prompting format, and one-click deployment via Baseten for production-grade, FP8/FP16 optimized inference with ~200 ms streaming latency.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common...
    Downloads: 35 This Week
    Last Update:
    See Project
  • 12
    FFmpegCommand

    FFmpegCommand

    Command library suitable for Android. It implements audio and video

    FFmpegCommand is a graphical utility designed to simplify the generation and execution of FFmpeg commands for multimedia processing tasks. It provides an interface where users can configure parameters such as codecs, bitrates, and formats without manually writing command-line instructions. The tool dynamically builds FFmpeg commands based on user selections, making complex workflows more accessible. It supports common operations such as transcoding, trimming, and format conversion....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Agili Hacker Podcast

    Agili Hacker Podcast

    AI tool that turns Hacker News posts into daily podcast updates

    Hacker Podcast is an AI-powered project that turns top Hacker News stories into a Chinese podcast. It automatically fetches trending posts each day, processes the content with AI, and generates concise summaries before converting them into audio. This creates a hands-free way to stay updated on tech, startups, and developer discussions without reading long threads. Hacker Podcast combines content aggregation, natural language processing, and text-to-speech to deliver clear and digestible updates. Users can listen through web interfaces or podcast platforms, while also accessing written summaries for deeper reading. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    AnalysisAVP

    AnalysisAVP

    Encode decode, rgb yuv h264 aac flv mp4 rtmp

    AnalysisAVP is a comprehensive educational repository focused on audio and video technology concepts, providing structured knowledge across multimedia systems and processing pipelines. It covers foundational topics such as encoding, decoding, color formats like RGB and YUV, and widely used codecs including H.264 and AAC. The project also explores media container formats like MP4 and FLV, along with streaming protocols such as RTMP and WebRTC, offering a broad understanding of media transmission. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Music-bot

    Music-bot

    A complete code to download for a cool Discord music bot

    Music-bot is a Discord bot designed to stream and manage music playback within voice channels, providing users with an interactive audio experience. It supports playing music from various online sources, including streaming platforms and direct URLs. The bot includes queue management features that allow users to add, remove, and reorder tracks during playback. It integrates audio processing tools to ensure smooth streaming and consistent playback quality. Music-bot also supports commands for controlling playback, such as pause, resume, skip, and volume adjustment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Markdownify MCP Server

    Markdownify MCP Server

    Convert files and web content into clean, usable Markdown easily

    ...It also allows retrieval of existing Markdown files, making it useful for documentation, research, and AI-assisted workflows. By standardizing content into Markdown, it helps unify inputs across different sources for better processing and integration with AI tools and developer environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    TADA

    TADA

    Open Source Speech Language Model

    TADA is an open-source speech-language modeling framework designed to unify spoken audio and text representations within a single generative architecture. The system focuses on aligning speech and text streams using a dual-alignment mechanism that synchronizes the acoustic signal with its textual representation. By modeling both modalities together, the framework allows developers to build systems capable of generating, understanding, and transforming speech and language simultaneously. This...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    NanoBoyAdvance

    NanoBoyAdvance

    A cycle-accurate Nintendo Game Boy Advance emulator

    NanoBoyAdvance is a cycle-accurate Game Boy Advance emulator that prioritizes precision and correctness in replicating original hardware behavior. It is designed to emulate the GBA at a very low level, including CPU timing, DMA operations, graphics processing, and memory behavior, ensuring that even edge cases and obscure hardware quirks are faithfully reproduced. The emulator achieves extremely high compatibility, passing multiple hardware test suites and accurately running games that rely on precise timing conditions. In addition to accuracy, it introduces enhancements such as a high-quality audio mixer that improves sound output without altering internal emulation behavior. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    Live API Web Console

    Live API Web Console

    A react-based starter app for using the Live API over websockets

    ...It ships with demo branches that show grounded search, function calling, and visualization—one example has the model calling a function that renders Vega/Altair graphs directly in the UI. Under the hood there’s an event-emitting WebSocket client, an audio in/out processing layer, and a minimal scaffolded view so you can focus on your app logic rather than wiring.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Pipecat

    Pipecat

    Framework for building real-time voice and multimodal AI agents

    Pipecat is an open source Python framework designed for building real-time voice and multimodal conversational AI agents. It provides developers with tools to orchestrate complex pipelines that combine speech recognition, language models, audio processing, and speech synthesis into a cohesive conversational system. Pipecat focuses on low-latency interactions so voice conversations with AI feel natural and responsive during live use. Pipecat allows applications to integrate multiple AI services and transports, enabling flexible deployment across different environments and communication channels. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    AutoSubs

    AutoSubs

    Instantly generate AI-powered subtitles on your device

    ...Users can customize subtitle styling, adjust timing, and export results in multiple formats, making it suitable for content creators, filmmakers, and editors. AutoSubs is designed with performance in mind, offering efficient processing through a Rust-based backend and supporting multiple operating systems including Windows, macOS, and Linux.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 22
    MediaPipe Solutions

    MediaPipe Solutions

    Cross-platform, customizable ML solutions

    MediaPipe is an open-source framework developed by Google for building cross-platform machine learning pipelines that process audio, video, and other streaming data in real time. The system provides developers with tools and reusable components that allow them to combine multiple machine learning models with preprocessing and postprocessing logic into efficient perception pipelines. These pipelines can run on a wide variety of platforms including mobile devices, desktop systems, web...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    ffmpeg_develop_doc

    ffmpeg_develop_doc

    2023, the latest audio and video learning materials, projects

    ffmpeg_develop_doc is a curated repository that aggregates a comprehensive collection of learning resources related to FFmpeg and multimedia development. It includes command references, technical articles, academic papers, tutorials, and example projects covering audio and video processing concepts. The repository is structured as a knowledge base, offering materials on encoding, decoding, streaming protocols, and real-time media systems. It also contains interview preparation resources and practical case studies, making it useful for both learning and professional development. In addition to documentation, it links to open-source projects and implementation examples that demonstrate real-world usage of FFmpeg. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Spring AI Alibaba Examples

    Spring AI Alibaba Examples

    Spring AI Alibaba examples for building and testing AI apps

    ...It is designed to help developers understand core concepts, explore practical implementations, and follow best practices when building AI-powered systems using the Spring ecosystem. Each module focuses on a specific use case such as chat, image processing, audio handling, graph workflows, and retrieval-augmented generation. The examples highlight how to integrate AI models, manage prompts, handle memory, and build multi-model or multi-agent workflows. Developers can explore individual project folders for detailed instructions and implementation guidance. Spring AI Alibaba Examples also supports experimentation through playground modules and encourages contributions to expand real-world AI use cases and improve development practices.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    clip-js

    clip-js

    online video editor built with nextjs, remotion and ffmpeg

    clip-js is a browser-based video editor built with modern web technologies such as Next.js and Remotion, designed to provide real-time editing and rendering directly in the browser. It enables users to create and edit video compositions using a timeline interface, combining video, audio, images, and text layers into a single project. The system uses a WebAssembly port of FFmpeg to perform high-quality rendering, allowing export of videos without relying on server-side processing. It includes interactive controls for trimming, splitting, and arranging media elements with precise timing. The editor supports dynamic adjustments such as opacity, positioning, and layering to fine-tune compositions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB