Search Results for "audio linux" - Page 5

Sort By:

Showing 849 open source projects for "audio linux"

View related business solutions

Python Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

IndexTTS2

Industrial-level controllable zero-shot text-to-speech system

IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice...

Downloads: 9 This Week

Last Update: 2025-11-27
See Project
2

Pixeltable

Data Infrastructure providing an approach to multimodal AI workloads

Pixeltable is an open-source Python data infrastructure framework designed to support the development of multimodal AI applications. The system provides a declarative interface for managing the entire lifecycle of AI data pipelines, including storage, transformation, indexing, retrieval, and orchestration of datasets. Unlike traditional architectures that require multiple tools such as databases, vector stores, and workflow orchestrators, Pixeltable unifies these functions within a...

Downloads: 6 This Week

Last Update: 2 days ago
See Project
3

Orpheus TTS

Towards Human-Sounding Speech

Orpheus TTS is a state-of-the-art open-source text-to-speech system built on a Llama-3B backbone, treating speech synthesis as a large language model problem instead of a traditional TTS pipeline. It is designed to produce human-like speech with natural intonation, emotion, and rhythm, targeting quality comparable to or better than many closed-source systems. The project ships both pretrained and finetuned English models, as well as a family of multilingual models released as a research...

Downloads: 2 This Week

Last Update: 2025-12-05
See Project
4

clone-voice

A sound cloning tool with a web interface, using your voice

Clone-voice is a local voice-cloning tool that lets you synthesize speech in any target voice or convert one recording into another voice using the same timbre. It is built around Coqui’s XTTS-v2 model, so it inherits multilingual support and modern neural TTS quality while wrapping it in a user-friendly desktop workflow. The app is designed to be very easy to use: you download a precompiled package, double-click app.exe, and it launches a browser-based web interface where you control...

Downloads: 11 This Week

Last Update: 2025-11-28
See Project
Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
5

Spring AI Alibaba Examples

Spring AI Alibaba examples for building and testing AI apps

Spring AI Alibaba Examples provides a collection of example projects that demonstrate how to use Spring AI and Spring AI Alibaba across different scenarios, from basic setups to more advanced AI applications. It is designed to help developers understand core concepts, explore practical implementations, and follow best practices when building AI-powered systems using the Spring ecosystem. Each module focuses on a specific use case such as chat, image processing, audio handling, graph...

1 Review

Downloads: 8 This Week

Last Update: 3 days ago
See Project
6

HY-World 1.5

A Systematic Framework for Interactive World Modeling

HY-WorldPlay is a Hunyuan AI project focusing on immersive multimodal content generation and interaction within virtual worlds or simulated environments. It aims to empower AI agents with the capability to both understand and generate multimedia content — including text, audio, image, and potentially 3D or game-world elements — enabling lifelike dialogue, environmental interpretations, and responsive world behavior. The platform targets use cases in digital entertainment, game worlds,...

Downloads: 8 This Week

Last Update: 2026-03-24
See Project
7

OpenAI Python

The official Python library for the OpenAI API

The OpenAI Python library provides convenient access to the OpenAI REST API from any Python 3.7+ application. The library includes type definitions for all request params and response fields, and offers both synchronous and asynchronous clients powered by httpx.

Downloads: 14 This Week

Last Update: 4 days ago
See Project
8

yt-dlp

A youtube-dl fork with additional features and fixes

yt-dlp is a youtube-dl fork based on the now inactive youtube-dlc. The main focus of this project is adding new features and patches while also keeping up to date with the original project

Downloads: 595 This Week

Last Update: 2026-03-17
See Project
9

VoxCPM

TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

VoxCPM is a tokenizer-free text-to-speech system that models speech in a continuous space, aiming for extremely realistic, context-aware synthesis and true-to-life zero-shot voice cloning. Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers. This design helps decouple semantic and acoustic...

Downloads: 12 This Week

Last Update: 4 days ago
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
10

Open Vision Agents by Stream

Build Vision Agents quickly with any model or video provider

Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio...

Downloads: 9 This Week

Last Update: 5 days ago
See Project
11

TADA

Open Source Speech Language Model

TADA is an open-source speech-language modeling framework designed to unify spoken audio and text representations within a single generative architecture. The system focuses on aligning speech and text streams using a dual-alignment mechanism that synchronizes the acoustic signal with its textual representation. By modeling both modalities together, the framework allows developers to build systems capable of generating, understanding, and transforming speech and language simultaneously. This...

Downloads: 0 This Week

Last Update: 2026-03-24
See Project
12

Qwen3-ASR

Qwen3-ASR is an open-source series of ASR models

Qwen3-ASR is an automatic speech recognition system in the QwenLM family, developed to convert spoken language into text with strong accuracy and real-time performance. As a specialized ASR variant of the broader Qwen language model ecosystem, it focuses on capturing reliable transcriptions from audio sources such as recordings, live streams, or conversational inputs while supporting low latency use cases. The architecture combines advanced neural acoustic modeling with context-aware...

Downloads: 3 This Week

Last Update: 2026-02-09
See Project
13

gTTS

Python library and CLI tool to interface with Google Translate

gTTS (Google Text-to-Speech) is a Python library and command-line tool that wraps the speech functionality of Google Translate. It lets you send text to the Google Translate TTS endpoint and receive spoken audio back as MP3 data, either written to a file, a file-like object, or standard output. The library is designed to handle long texts, using a speech-specific sentence tokenizer that keeps intonation and punctuation natural while splitting requests into acceptable chunks. It supports...

Downloads: 6 This Week

Last Update: 2025-11-28
See Project
14

Groq Python

The official Python Library for the Groq API

Groq Python is the official Python SDK for the Groq REST API, giving Python developers straightforward access to Groq’s LLM, chat, audio, and other AI services. Through this library, you can call Groq’s models from Python code — for example to request chat completions, code generation, transcription, or any supported endpoint — using idiomatic Python syntax. The SDK handles authentication (via environment variable or parameter), defines proper type-safe request/response data types, and...

Downloads: 8 This Week

Last Update: 2026-03-25
See Project
15

pyVideoTrans

Translate the video from one language to another and embed dubbing

pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the...

Downloads: 25 This Week

Last Update: 2026-03-27
See Project
16

pyglet

pyglet is a cross-platform windowing and multimedia library for Python

Pyglet is a cross-platform windowing and multimedia library for Python, intended for developing games and other visually rich applications. It supports windowing, input event handling, OpenGL graphics, loading images and videos, and playing sounds and music.

Downloads: 5 This Week

Last Update: 2026-04-05
See Project
17

NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT

NExT-GPT is an open-source research framework that implements an advanced multimodal large language model capable of understanding and generating content across multiple modalities. Unlike traditional models that primarily handle text, NExT-GPT supports input and output combinations involving text, images, video, and audio in a unified architecture. The system connects a large language model with multimodal encoders and diffusion-based decoders so it can interpret information from different...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
18

txtai

Build AI-powered semantic search applications

txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid...

Downloads: 6 This Week

Last Update: 2026-03-17
See Project
19

Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models

Qwen3-TTS is an open-source text-to-speech (TTS) project built around the Qwen3 large language model family, focused on generating high-quality, natural-sounding speech from plain text input. It provides researchers and developers with tools to transform text into expressive, intelligible audio, supporting multiple languages and voice characteristics tuned for clarity and fluidity. The project includes pre-trained models and inference scripts that let users synthesize speech locally or...

Downloads: 33 This Week

Last Update: 2026-03-17
See Project
20

Diffusers

State-of-the-art diffusion models for image and audio generation

Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, Diffusers is a modular toolbox that supports both. Our library is designed with a focus on usability over performance, simple over easy, and customizability over abstractions. State-of-the-art diffusion pipelines that can be run in inference with just a...

Downloads: 4 This Week

Last Update: 2026-03-25
See Project
21

LTX-2

Python inference and LoRA trainer package for the LTX-2 audio–video

LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries,...

Downloads: 40 This Week

Last Update: 2026-03-30
See Project
22

Sopro TTS

A lightweight text-to-speech model with zero-shot voice cloning

Sopro TTS is an open-source text-to-speech (TTS) project that implements a lightweight model capable of producing speech from text with zero-shot voice cloning, meaning it can mimic a speaker’s voice from only a few seconds of reference audio. Built with a 169 million-parameter architecture that uses dilated convolutions and cross-attention layers instead of large Transformer stacks, it achieves relatively fast real-time performance even on CPUs (about a 0.25 real-time factor measured on an...

Downloads: 2 This Week

Last Update: 2026-02-06
See Project
23

Adversarial Robustness Toolbox

Adversarial Robustness Toolbox (ART) - Python Library for ML security

Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference. ART supports all popular machine learning frameworks (TensorFlow, Keras, PyTorch, MXNet, sci-kit-learn, XGBoost, LightGBM, CatBoost, GPy, etc.), all data types (images, tables, audio,...

Downloads: 9 This Week

Last Update: 2025-07-07
See Project
24

MoviePy

Video editing with Python

MoviePy is a Python module for video editing, which can be used for basic operations (like cuts, concatenations, title insertions), video compositing (a.k.a. non-linear editing), video processing, or to create advanced effects. It can read and write the most common video formats, including GIF. MoviePy is an open source software originally written by Zulko and released under the MIT licence. It works on Windows, Mac, and Linux, with Python 2 or Python 3. The code is hosted on Github, where...

Downloads: 20 This Week

Last Update: 2025-05-21
See Project
25

VideoCaptioner

AI-powered tool for generating, optimizing, and translating subtitles

VideoCaptioner is an open source AI-powered subtitle processing tool designed to simplify the workflow of creating subtitles for videos. It integrates speech recognition, language processing, and translation technologies to automatically generate and refine subtitles from video or audio sources. VideoCaptioner uses speech-to-text engines such as Whisper variants to transcribe spoken content and convert it into subtitle text with accurate timestamps. After transcription, large language models...

Downloads: 19 This Week

Last Update: 2026-03-28
See Project