Search Results for "audio video streaming server source code"

Sort By:

Showing 103 open source projects for "audio video streaming server source code"

View related business solutions

Python Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

LiveAvatar

Streaming Real-time Audio-Driven Avatar Generation

LiveAvatar is an open-source research and implementation project that provides a unified framework for real-time, streaming, interactive avatar video generation driven by audio and other control signals. It implements techniques from state-of-the-art diffusion-based avatar modeling to support infinite-length continuous video generation with low latency, enabling interactive AI avatars that maintain continuity and realism over extended sessions. ...

Downloads: 1 This Week

Last Update: 2026-01-30
See Project
2

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...

Downloads: 3 This Week

Last Update: 2025-09-23
See Project
3

Triton Inference Server

The Triton Inference Server provides an optimized cloud

...Triton delivers optimized performance for many query types, including real-time, batched, ensembles, and audio/video streaming. Provides Backend API that allows adding custom backends and pre/post-processing operations. Model pipelines using Ensembling or Business Logic Scripting (BLS). HTTP/REST and GRPC inference protocols based on the community-developed KServe protocol. A C API and Java API allow Triton to link directly into your application for edge and other in-process use cases.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
4

Dolphin

Document Image Parsing via Heterogeneous Anchor Prompting”

Dolphin — maintained by ByteDance — is a project aimed at providing a high-performance, robust, and extensible media or multimedia framework / player infrastructure (or possibly a streaming media solution), intended to meet modern demands for efficiency, flexibility, and integration in media-heavy applications. It seeks to combine performant media playback or handling (audio/video decoding, streaming, buffering) with a modular, developer-friendly API that allows easy embedding into larger applications or services. ...

Downloads: 0 This Week

Last Update: 2025-12-17
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
5

Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM

...It achieves state-of-the-art results: across 36 audio and audio-visual benchmarks, it hits open-source SOTA on 32 and overall SOTA on 22, outperforming or matching strong closed-source models such as Gemini-2.5 Pro and GPT-4o. To reduce latency, especially in audio/video streaming, Talker predicts discrete speech codecs via a multi-codebook scheme and replaces heavier diffusion approaches.

Downloads: 3 This Week

Last Update: 2026-01-08
See Project
6

You-Get

Dumb downloader that scrapes the web

You-Get is a small command-line utility for downloading media (video, audio and images) from the Web when there are no other means to do so. It can download video and audio files from such popular web sites as YouTube, Twitter, Niconico, Vimeo, Flickr, Instagram and a whole lot more. You-Get is a great option for when you want to enjoy your favorite videos, audio or images from the internet without having to open any web browsers or get interrupted by ads. ...

Downloads: 1 This Week

Last Update: 2025-03-07
See Project
7

Selkies-GStreamer

Open-Source Low-Latency Accelerated Linux WebRTC HTML5 Remote Desktop

selkies-gstreamer is a GStreamer-based media streaming component used in the Selkies project, a cloud-native platform designed for interactive desktop and application streaming. This module acts as a high-performance media pipeline that captures video, encodes it with low latency, and streams it via WebRTC to client browsers. It is optimized for GPU-accelerated encoding and integrates with Kubernetes-based deployments to enable scalable, real-time remote desktop sessions. This component...

Downloads: 3 This Week

Last Update: 2025-03-27
See Project
8

PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model

...Low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey. We provide high-speed and ultra-lightweight models, and also cutting-edge technology. We provide production ready streaming asr and streaming tts system. Our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.

Downloads: 2 This Week

Last Update: 2025-03-04
See Project
9

MoviePy

Video editing with Python

MoviePy is a Python module for video editing, which can be used for basic operations (like cuts, concatenations, title insertions), video compositing (a.k.a. non-linear editing), video processing, or to create advanced effects. It can read and write the most common video formats, including GIF. MoviePy is an open source software originally written by Zulko and released under the MIT licence. It works on Windows, Mac, and Linux, with Python 2 or Python 3. The code is hosted on Github, where...

Downloads: 32 This Week

Last Update: 2025-05-21
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
10

Swing Music

Swing Music is a beautiful, self-hosted music player

Swing Music is a beautiful, self-hosted music player and streaming server that lets you bring your personal audio library online with a modern browser-based interface, giving you a private alternative to mainstream streaming services. Designed to be both elegant and powerful, the project scans your local music files (like MP3s or FLACs), organizes metadata, and streams them on-demand to any device with a browser or its Android client. It includes features like folder browsing, playlist...

Downloads: 2 This Week

Last Update: 2026-02-04
See Project
11

HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

HunyuanVideo is a cutting-edge framework designed for large-scale video generation, leveraging advanced AI techniques to synthesize videos from various inputs. It is implemented in PyTorch, providing pre-trained model weights and inference code for efficient deployment. The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU...

1 Review

Downloads: 3 This Week

Last Update: 2025-09-23
See Project
12

OpenAI-Compatible Edge-TTS API

Free, high-quality text-to-speech API endpoint to replace OpenAI

OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to...

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
13

MiniCPM-o

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...

Downloads: 0 This Week

Last Update: 2025-05-15
See Project
14

WhatsApp MCP Server

WhatsApp MCP server enabling AI access to chats and messaging

whatsapp-mcp is an open source Model Context Protocol (MCP) server that enables AI agents to interact directly with a user’s WhatsApp account through a structured interface. It acts as a bridge between WhatsApp and large language models, allowing controlled access to messages, chats, and contacts. whatsapp-mcp is composed of two main components: a Go-based bridge that connects to the WhatsApp Web API and stores data locally, and a Python-based MCP server that exposes tools for AI...

Downloads: 3 This Week

Last Update: 2026-03-17
See Project
15

LatentSync

Taming Stable Diffusion for Lip Sync

LatentSync is an open-source framework from ByteDance that produces high-quality lip-synchronization for video by using an audio-conditioned latent diffusion model, bypassing traditional intermediate motion representations. In effect, given a source video (with masked or reference frames) and an audio track, LatentSync directly generates frames whose lip motions and expressions align with the audio, producing convincing talking-head or animated lip-sync output. The system leverages a U-Net...

Downloads: 1 This Week

Last Update: 2025-12-02
See Project
16

Sopro TTS

A lightweight text-to-speech model with zero-shot voice cloning

...The model is designed to work with a small set of dependencies and to be accessible for developers who want offline TTS with customizable voice style, including options for streaming or non-streaming generation modes. Users can install it with standard Python tools, run a demo server locally, and experiment with CLI or Python API usage for producing synthetic speech.

Downloads: 0 This Week

Last Update: 2026-02-06
See Project
17

Open Vision Agents by Stream

Build Vision Agents quickly with any model or video provider

Open Vision Agents by Stream is an open source framework from Stream for building real time, multimodal AI agents that watch, listen, and respond to live video streams. It focuses on combining video understanding models, such as YOLO and Roboflow based detectors, with real time large language models like OpenAI Realtime and Gemini Live to create interactive experiences. The framework uses Stream’s ultra low latency edge network so agents can join sessions quickly and maintain very low audio...

Downloads: 1 This Week

Last Update: 2 days ago
See Project
18

Vidi2

Large Multimodal Models for Video Understanding and Editing

...Vidi targets applications like intelligent video editing, automated video search, content analysis, and editing assistance, enabling users to efficiently locate relevant segments and objects in hours-long footage. The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
19

SoulSync

Automated Music Discovery and Collection Manager

SoulSync is an intelligent music discovery and automation platform designed to bridge streaming services with self-hosted media libraries, enabling users to automatically grow and maintain curated music collections. The system continuously monitors selected artists and detects new releases, then generates personalized playlists such as Release Radar and Discovery Weekly using its built-in recommendation logic. It can automatically download missing tracks from multiple sources including...

Downloads: 7 This Week

Last Update: 4 days ago
See Project
20

Story Flicks

Generate high-definition story short videos with one click using AI

...For creators who want to produce narrative short-form content — whether for social media, storytelling, or prototyping video ideas — story-flicks offers a lightweight, code-backed alternative to complex video editing suites. Because the project is open and modifiable, developers can customize the generation pipeline: adjust story structure, alter rendering parameters, tweak video quality or resolution, or integrate with other AI models (e.g. for audio, voice-over, or image-to-video). ...

Downloads: 11 This Week

Last Update: 2025-12-14
See Project
21

MiniMax-MCP

Official MiniMax Model Context Protocol (MCP) server

MiniMax-MCP is the official Model Context Protocol (MCP) server for accessing MiniMax’s multimodal generative APIs from MCP-compatible clients. It acts as a bridge between tools like Claude Desktop, Cursor, Windsurf, OpenAI Agents, and the MiniMax platform, exposing capabilities such as text-to-speech, voice cloning, image generation, text-to-image, video generation, image-to-video, text-to-video, and music generation. The server is written in Python and distributed under the MIT license,...

Downloads: 0 This Week

Last Update: 2026-01-07
See Project
22

NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT

NExT-GPT is an open-source research framework that implements an advanced multimodal large language model capable of understanding and generating content across multiple modalities. Unlike traditional models that primarily handle text, NExT-GPT supports input and output combinations involving text, images, video, and audio in a unified architecture. The system connects a large language model with multimodal encoders and diffusion-based decoders so it can interpret information from different...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
23

LTX-2

Python inference and LoRA trainer package for the LTX-2 audio–video

LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries,...

Downloads: 57 This Week

Last Update: 2026-03-11
See Project
24

Dia

A TTS model capable of generating ultra-realistic dialogue

Dia is a neural text-to-speech model designed specifically for generating ultra-realistic dialogue in a single pass. Instead of focusing on isolated sentences or flat narration, it is optimized for conversational audio, complete with natural turn-taking, prosody, and pacing. The model can be conditioned on a reference audio sample, allowing you to control emotion, tone, and other stylistic aspects of the speech. It can also produce nonverbal vocalizations like laughter, coughs, clearing the...

Downloads: 3 This Week

Last Update: 2025-11-28
See Project
25

Internet DJ Console

A feature packed DJ console and internet radio client for Linux users

Conceived as an internet radio Shoutcast/Icecast client and DJ console IDJC has two main media players, a background track player, effects buttons, crossfader, webm, aac, ogg, and mp3 streaming, stream automation timers, aux input, voice and VoIP integration. Media file formats include: mp3, ogg, flac, wma, wav, m4a, m3u, xspf, pls, and cue sheet support, IRC track and station announcements, uses jack audio connection kit to provide a flexible audio chain. This list of features is by no...

32 Reviews

Downloads: 8 This Week

Last Update: 2026-01-10
See Project