Search Results for "audio linux" - Page 5

Sort By:

3890 projects for "audio linux" with 1 filter applied:

BSD Clear Filters & Widen Search

AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

SimpleTuner

A general fine-tuning kit geared toward image/video/audio diffusion

SimpleTuner is an open-source toolkit designed to simplify the fine-tuning of modern diffusion models for generating images, video, and audio. The project focuses on providing a clear and understandable training environment for researchers, developers, and artists who want to customize generative AI models without navigating complex machine learning pipelines. It supports fine-tuning workflows for models such as Stable Diffusion variants and other diffusion architectures, enabling users to...

Downloads: 0 This Week

Last Update: 4 days ago
See Project
2

NanoBoyAdvance

A cycle-accurate Nintendo Game Boy Advance emulator

NanoBoyAdvance is a cycle-accurate Game Boy Advance emulator that prioritizes precision and correctness in replicating original hardware behavior. It is designed to emulate the GBA at a very low level, including CPU timing, DMA operations, graphics processing, and memory behavior, ensuring that even edge cases and obscure hardware quirks are faithfully reproduced. The emulator achieves extremely high compatibility, passing multiple hardware test suites and accurately running games that rely...

Downloads: 4 This Week

Last Update: 2026-04-07
See Project
3

Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models

Qwen3-TTS is an open-source text-to-speech (TTS) project built around the Qwen3 large language model family, focused on generating high-quality, natural-sounding speech from plain text input. It provides researchers and developers with tools to transform text into expressive, intelligible audio, supporting multiple languages and voice characteristics tuned for clarity and fluidity. The project includes pre-trained models and inference scripts that let users synthesize speech locally or...

Downloads: 43 This Week

Last Update: 2026-03-17
See Project
4

Allegro

The official Allegro 5 git repository. Pull requests welcome

Allegro 5 is the latest major revision of the Allegro library, designed to take advantage of modern hardware, including hardware acceleration using 3D cards.

Downloads: 4 This Week

Last Update: 2026-02-09
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
5

AI-Media2Doc

AI tool converting video/audio into structured documents instantly

AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
6

NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT

NExT-GPT is an open-source research framework that implements an advanced multimodal large language model capable of understanding and generating content across multiple modalities. Unlike traditional models that primarily handle text, NExT-GPT supports input and output combinations involving text, images, video, and audio in a unified architecture. The system connects a large language model with multimodal encoders and diffusion-based decoders so it can interpret information from different...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
7

GenAI Processors

GenAI Processors is a lightweight Python library

GenAI Processors is a lightweight Python library for building modular, asynchronous, and composable AI pipelines around Gemini. Its central abstraction is the Processor, a unit of work that consumes an asynchronous stream of parts (text, images, audio, JSON) and produces another stream, making it natural to chain operations and keep everything streaming end-to-end. Processors can be composed sequentially (to build multi-step flows) or in parallel (to fan-out work and merge results), which...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
8

LTX-2

Python inference and LoRA trainer package for the LTX-2 audio–video

LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries,...

Downloads: 36 This Week

Last Update: 2026-04-23
See Project
9

Spring AI Alibaba Examples

Spring AI Alibaba examples for building and testing AI apps

Spring AI Alibaba Examples provides a collection of example projects that demonstrate how to use Spring AI and Spring AI Alibaba across different scenarios, from basic setups to more advanced AI applications. It is designed to help developers understand core concepts, explore practical implementations, and follow best practices when building AI-powered systems using the Spring ecosystem. Each module focuses on a specific use case such as chat, image processing, audio handling, graph...

1 Review

Downloads: 3 This Week

Last Update: 7 days ago
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

gTTS

Python library and CLI tool to interface with Google Translate

gTTS (Google Text-to-Speech) is a Python library and command-line tool that wraps the speech functionality of Google Translate. It lets you send text to the Google Translate TTS endpoint and receive spoken audio back as MP3 data, either written to a file, a file-like object, or standard output. The library is designed to handle long texts, using a speech-specific sentence tokenizer that keeps intonation and punctuation natural while splitting requests into acceptable chunks. It supports...

Downloads: 3 This Week

Last Update: 2025-11-28
See Project
11

OmniVoice

High-Quality Voice Cloning TTS for 600+ Languages

The OmniVoice project is a cutting-edge multilingual text-to-speech system designed to generate high-quality speech across more than 600 languages. Built on a diffusion language model-style architecture, it combines scalability with strong performance, enabling both natural-sounding voice synthesis and efficient inference speeds. One of its most notable capabilities is zero-shot voice cloning, allowing users to replicate a speaker’s voice using only a short reference audio clip. In addition,...

Downloads: 2 This Week

Last Update: 3 days ago
See Project
12

MediaPipe Solutions

Cross-platform, customizable ML solutions

MediaPipe is an open-source framework developed by Google for building cross-platform machine learning pipelines that process audio, video, and other streaming data in real time. The system provides developers with tools and reusable components that allow them to combine multiple machine learning models with preprocessing and postprocessing logic into efficient perception pipelines. These pipelines can run on a wide variety of platforms including mobile devices, desktop systems, web...

Downloads: 2 This Week

Last Update: 2026-04-23
See Project
13

LuxTTS

A high-quality rapid TTS voice cloning model

LuxTTS is an open-source text-to-speech (TTS) system focused on delivering high-quality, rapid voice synthesis and voice cloning that runs extremely fast and efficiently on consumer hardware. It implements a lightweight architecture based on ZipVoice and optimized sampling techniques so that it can generate speech at speeds up to roughly 150 times real-time on a single GPU and faster than real-time on CPU, all while producing audio at high fidelity with 48 kHz quality. The project supports...

Downloads: 2 This Week

Last Update: 2026-03-12
See Project
14

Portkey AI Gateway

A blazing fast AI Gateway with integrated guardrails

Portkey AI Gateway aims to offer a blazing fast, secure, and flexible gateway for interacting with a wide variety of models and enforcing guardrails. It presents a single, friendly API through which you can route to 200+ LLMs, while applying configurable input/output guardrails to enforce policies or restrict certain content. It supports automatic retries, fallbacks, load balancing across providers or keys, and request timeouts to avoid latency spikes. The gateway is multimodal: it can...

Downloads: 2 This Week

Last Update: 2026-01-12
See Project
15

Network Audio System

Network transparent, client/server audio transport system

The Network Audio System is a network transparent, client/server audio transport system. It can be described as the audio equivalent of an X server. This project is currently in maintenance mode. Patches are accepted, releases will be sporadic.

Downloads: 30 This Week

Last Update: 2025-03-19
See Project
16

clip-js

online video editor built with nextjs, remotion and ffmpeg

clip-js is a browser-based video editor built with modern web technologies such as Next.js and Remotion, designed to provide real-time editing and rendering directly in the browser. It enables users to create and edit video compositions using a timeline interface, combining video, audio, images, and text layers into a single project. The system uses a WebAssembly port of FFmpeg to perform high-quality rendering, allowing export of videos without relying on server-side processing. It includes...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
17

Sapiens

High-resolution models for human tasks

Sapiens is a research framework from Meta AI focused on embodied intelligence and human-like multimodal learning, aiming to train agents that can perceive, reason, and act in complex environments. It integrates sensory inputs such as vision, audio, and proprioception into a unified learning architecture that allows agents to understand and adapt to their surroundings dynamically. The project emphasizes long-horizon reasoning and cross-modal grounding—connecting language, perception, and...

Downloads: 1 This Week

Last Update: 2025-10-07
See Project
18

Verticals v3

Automated YouTube Shorts pipeline

Verticals v3 is an automated content generation workflow designed to create and process YouTube Shorts videos programmatically. It combines multiple tools and scripts to handle tasks such as downloading source material, editing clips, adding subtitles, and formatting output for vertical video platforms. The pipeline emphasizes automation, allowing users to produce short-form content at scale with minimal manual intervention. It integrates FFmpeg and other media processing tools to handle...

Downloads: 10 This Week

Last Update: 6 days ago
See Project
19

Phaser HTML5 Game Framework

Phaser is a free and fast 2D game framework for making HTML5 games

Phaser is a popular open-source 2D game framework for making HTML5 games for desktop and mobile platforms. Built with JavaScript and powered by WebGL and Canvas, it offers a robust API for developing everything from arcade to platformer and puzzle games.

Downloads: 4 This Week

Last Update: 22 hours ago
See Project
20

Groq TypeScript / Node.s

The official Node.js / Typescript library for the Groq API

Groq TypeScript / Node.s (also often referred to as “groq-sdk” on npm) is the official Node.js / TypeScript client library for Groq’s REST API, enabling JavaScript/TypeScript developers to integrate LLM and AI-powered services into web backends, serverless functions, or frontend apps. It exports strongly-typed interfaces for models, chat completions, file uploads (e.g. for audio transcription), and other endpoints, allowing for better type safety and developer experience when using Groq from...

Downloads: 0 This Week

Last Update: 2026-03-25
See Project
21

SALMONN family

A suite of advanced multi-modal LLMs

SALMONN is a family of advanced multi-modal large language models (LLMs) developed by ByteDance — designed to handle and integrate multiple data modalities (e.g. text, audio, video) rather than just plain text. The repository bundles different branches targeting specialized tasks (e.g. video-SALMONN, speech-quality assessment, general multimodal tasks), suggesting that the project is modular and extensible across domains. SALMONN aims to push the frontier of multi-modal AI by allowing models...

Downloads: 0 This Week

Last Update: 2026-04-20
See Project
22

IndexTTS2

Industrial-level controllable zero-shot text-to-speech system

IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice...

Downloads: 2 This Week

Last Update: 2025-11-27
See Project
23

AutoSubs

Instantly generate AI-powered subtitles on your device

...AutoSubs is designed with performance in mind, offering efficient processing through a Rust-based backend and supporting multiple operating systems including Windows, macOS, and Linux.

Downloads: 10 This Week

Last Update: 2 days ago
See Project
24

Sopro TTS

A lightweight text-to-speech model with zero-shot voice cloning

Sopro TTS is an open-source text-to-speech (TTS) project that implements a lightweight model capable of producing speech from text with zero-shot voice cloning, meaning it can mimic a speaker’s voice from only a few seconds of reference audio. Built with a 169 million-parameter architecture that uses dilated convolutions and cross-attention layers instead of large Transformer stacks, it achieves relatively fast real-time performance even on CPUs (about a 0.25 real-time factor measured on an...

Downloads: 1 This Week

Last Update: 2026-02-06
See Project
25

LLM Tornado

The .NET library to build AI agents with 30+ built-in connectors

LLM Tornado is a provider-agnostic .NET SDK designed to build, orchestrate, and deploy AI agents and workflows with a strong focus on flexibility and integration. It provides a unified interface that connects to more than 30 AI providers and vector databases, allowing developers to switch between models and services without rewriting application logic. The framework introduces a powerful orchestration system based on graph-like structures, where agents, tasks, and transitions can be defined...

Downloads: 12 This Week

Last Update: 6 days ago
See Project