Showing 6656 open source projects for "audio linux"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    OuteTTS is an interface library for running OuteTTS text-to-speech models across a range of backends, making it easier to deploy the same model on different hardware and runtimes. It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines. The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Harmonoid

    Harmonoid

    Plays & manages your music library. Looks beautiful & juicy

    Plays & manages your music library. Looks beautiful & juicy. Playlists, visuals, synced lyrics, pitch shift, volume boost & more.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Lidarr

    Lidarr

    Looks and smells like Sonarr but made for music

    Lidarr is an open-source music collection manager tailored to automate the tracking, downloading, and organizing of music tracks and albums from Usenet, BitTorrent, or other sources. It continuously monitors RSS feeds for new releases from your favorite artists, automatically retrieves them, sorts files into your library, and ensures consistent naming and tagging so your collection stays tidy and accessible. The tool also supports quality upgrades: if a better version of a track becomes...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    Agili Hacker Podcast

    Agili Hacker Podcast

    AI tool that turns Hacker News posts into daily podcast updates

    Hacker Podcast is an AI-powered project that turns top Hacker News stories into a Chinese podcast. It automatically fetches trending posts each day, processes the content with AI, and generates concise summaries before converting them into audio. This creates a hands-free way to stay updated on tech, startups, and developer discussions without reading long threads. Hacker Podcast combines content aggregation, natural language processing, and text-to-speech to deliver clear and digestible...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    YuE

    YuE

    Open source AI model for generating full songs from lyrics prompts

    YuE is an open source project that provides a foundation model designed for full-song music generation using artificial intelligence. It focuses on transforming text inputs such as lyrics and genre prompts into complete musical compositions that include both vocal and instrumental tracks. Unlike many shorter audio generators, the model is capable of producing songs that last several minutes while maintaining coherent musical structure and alignment with the provided lyrics. YuE introduces a...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    Tagify

    Tagify

    Lightweight, efficient Tags input component in Vanilla JS

    Transforms an input field or a textarea into a Tags component, in an easy, customizable way, with great performance and a small code footprint, exploded with features. Customizable HTML templates for the different areas of the component (wrapper, tags, dropdown, dropdown item, dropdown header, dropdown footer) Shows suggestions list (flexible settings & styling) at full (component) width or next to the typed texted (caret) Allows setting suggestions' aliases for easier fuzzy-searching....
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    VibeVoice ComfyUI

    VibeVoice ComfyUI

    ComfyUI integration for Microsoft's VibeVoice text-to-speech model

    VibeVoice ComfyUI is a comprehensive wrapper that integrates Microsoft’s VibeVoice text-to-speech models directly into ComfyUI workflows. It exposes VibeVoice as a set of custom nodes so you can build single-speaker and multi-speaker voice generation pipelines visually, combining TTS with other audio or generative components. The integration supports high-quality single-speaker synthesis as well as scripted multi-speaker conversations, with optional voice cloning from audio samples for each...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    Whisper-WebUI

    Whisper-WebUI

    A Web UI for easy subtitle using whisper model

    Whisper WebUI is an open-source browser-based interface that simplifies the use of Whisper speech recognition models by providing an intuitive graphical environment for transcription, translation, and subtitle generation. Built with Gradio, it allows users to upload audio or video files, process them locally, and generate accurate text outputs without relying on command-line tools. The platform integrates optimized implementations such as faster-whisper, significantly improving transcription...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 9
    notebooklm-py

    notebooklm-py

    Unofficial Python API and agentic skill for Google NotebookLM

    notebooklm-py is an unofficial Python API and agent-ready integration layer for Google NotebookLM that exposes NotebookLM functionality through code, the command line, and AI agent workflows. Its goal is to provide programmatic access not just to standard notebook operations, but also to many capabilities that are either limited or unavailable in the web interface, making it especially useful for automation and custom pipelines. The project covers notebook management, source ingestion,...
    Downloads: 15 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Remotion

    Remotion

    Make videos programmatically with React

    Remotion is a cutting-edge library that lets developers create real videos programmatically using React components, transforming familiar UI paradigms into a flexible, code-driven video production workflow. Instead of traditional timeline editors, Remotion leverages HTML, CSS, and JavaScript to define video frames, animations, and transitions, which means developers can use states, props, loops, and component hierarchies to automate complex motion graphics. Because it integrates with the...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 11
    Nextcloud Talk

    Nextcloud Talk

    Video- & audio-conferencing app for Nextcloud

    Nextcloud Talk is the official chat, video and audio conferencing app for Nextcloud that allows users to chat, call and screenshare with multiple other users. Nextcloud offers better protection for your communication as it provides end-to-end encryption and keeps even metadata from leaking. You can have private, group, public or password protected calls by simply inviting one person, a whole group, or sending a public link as an invitation to a call. It is also conveniently integrated with...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 12
    Anime Player

    Anime Player

    Video player for improving quality of hand-drawn images

    A video player that enhances the quality of a hand-drawn image using Anime4K's high-performance scaling algorithm. This program is a video player written in the Python programming language using the PySimpleGUI graphical user interface library, an mpv media player, and the Anime4K scaling algorithm . Anime Player is designed to play video and audio files and includes functions such as opening files, URLs and folders, setting image scaling parameters using the Anime4K algorithm, creating an...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 13
    whatsapp-web.js

    whatsapp-web.js

    WhatsApp library for NodeJS that connects through the browser app

    A WhatsApp client library for NodeJS that connects through the WhatsApp Web browser app. Programmatically control WhatsApp whether you're running user or business accounts. It uses Puppeteer to run a real instance of Whatsapp Web to avoid getting blocked. Programmatically control WhatsApp whether you're running user or business accounts. Whatsapp-web.js connects to an official version of WhatsApp Web under the hood, reducing ban risks. The object-oriented approach makes it easy to get...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 14
    Allegro

    Allegro

    The official Allegro 5 git repository. Pull requests welcome

    Allegro 5 is the latest major revision of the Allegro library, designed to take advantage of modern hardware, including hardware acceleration using 3D cards.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 15
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented...
    Downloads: 61 This Week
    Last Update:
    See Project
  • 16
    Mixxx

    Mixxx

    Mixxx is Free DJ software that gives you everything you need

    Free and open source DJ software for Windows, macOS, and Linux. Mixxx integrates the tools DJs need to perform creative live mixes with digital music files. Whether you are a new DJ with just a laptop or an experienced turntablist, Mixxx can support your style and techniques of mixing. BPM and musical key detection help you find the perfect next track from your library. Use Sync Lock to match the tempo and beats of four songs for seamless mixing. Built-in mappings for DJ controller hardware...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 17
    Shaka Player

    Shaka Player

    JavaScript player library / DASH & HLS client / MSE-EME player

    Shaka Player is an open-source JavaScript library for adaptive media. It plays adaptive media formats (such as DASH and HLS) in a browser, without using plugins or Flash. Instead, Shaka Player uses the open web standards MediaSource Extensions and Encrypted Media Extensions. Shaka Player also supports offline storage and playback of media using IndexedDB. Content can be stored on any browser. Storage of licenses depends on browser support. Our main goal is to make it as easy as possible to...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 18
    Hyprnote

    Hyprnote

    Local-first AI Notepad for Private Meetings

    Hyprnote is an open-source, privacy-first AI notepad app designed for taking notes during meetings—transcribing audio (microphone and system) and generating context-rich summaries using on-device AI models like Whisper and HyprLLM, all without any data leaving your machine.(turn0search7, turn0search1). Listens to your meetings while you write. Crafts smart summaries based on your quick notes. Runs completely offline using open-source models like Whisper or HyprLLM. Use approved third-party...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 19
    MediaDevices

    MediaDevices

    Go implementation of the MediaDevices API

    mediadevices is a Go library developed by the Pion WebRTC team that enables real-time access to audio and video devices for building native Go applications involving media streaming and conferencing. It provides a cross-platform, unified API for capturing and manipulating media streams and is often used in combination with Pion WebRTC for peer-to-peer communications. Its support for device enumeration, media constraints, and frame processing makes it a powerful building block for custom...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    comfyui-mixlab-nodes

    comfyui-mixlab-nodes

    Workflow and speech recognition app

    comfyui-mixlab-nodes is a large collection of custom nodes for ComfyUI that turns workflows into interactive apps and adds real-time multimedia, LLM, and TTS capabilities. It introduces a “Workflow-to-APP” concept, where a ComfyUI graph can be transformed into a Web App through an AppInfo node, complete with categories, batch prompts, and editable configurations. The project also brings Real-time Design features like screen capture and floating video nodes, enabling creative pipelines that...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    RealtimeTTS

    RealtimeTTS

    Converts text to speech in realtime

    RealtimeTTS is a low-latency text-to-speech library built for real-time applications such as voice chat with LLMs, assistants, and interactive tools. It is designed around a streaming model: you can feed it text incrementally (for example, as an LLM responds) and get audio output almost immediately, which keeps end-to-end latency very low. The library is engine-agnostic and plugs into a wide range of cloud and local TTS systems, including OpenAI, ElevenLabs, Azure, Coqui, Piper, StyleTTS2,...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    BlogWizard

    BlogWizard

    Generate blog articles from video or audio

    BlogWizard is a demo/utility project built on top of Groq’s LLM infrastructure that converts video or audio content into well-structured blog posts, enabling creators to repurpose multimedia content into text — useful for SEO, accessibility, or reaching audiences that prefer reading. The tool uses transcription (e.g. via Whisper) to extract text from audio/video, then runs an LLM-based generation pipeline to transform that content into coherent, readable blog-format posts — with sections,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Pipecat

    Pipecat

    Framework for building real-time voice and multimodal AI agents

    Pipecat is an open source Python framework designed for building real-time voice and multimodal conversational AI agents. It provides developers with tools to orchestrate complex pipelines that combine speech recognition, language models, audio processing, and speech synthesis into a cohesive conversational system. Pipecat focuses on low-latency interactions so voice conversations with AI feel natural and responsive during live use. Pipecat allows applications to integrate multiple AI...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    Amazon Chime SDK React Components

    Amazon Chime SDK React Components

    Chime React Component Library with integrations with the Amazon SDK

    The Amazon Chime SDK makes it easy to add collaborative audio calling, video calling, and screen share features to web applications by using the same infrastructure services that power millions of Amazon Chime online meetings. The Amazon Chime SDK React Component Library supplies client-side state management and reusable UI components for common web interfaces used in audio and video conferencing applications, including: video tile grids, microphone activity indicators, and call controls....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common...
    Downloads: 33 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB