audio streaming server free download

MOSS-TTS Family

MOSS‑TTS Family open‑source speech and sound generation model

...The broader family also includes dialogue generation, prompt-based voice creation, streaming voice-agent support, and a unified audio tokenizer. It is especially useful for developers building dubbing, podcasts, audiobooks, voice assistants, character voices, and creative audio tools.

Downloads: 1 This Week

Last Update: 2 days ago

See Project

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds state-of-the-art performance in many multimodal benchmarks, particularly spoken language understanding, audio reasoning, image/video understanding, etc. ...

Downloads: 0 This Week

Last Update: 2025-09-23

See Project

Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM

...It achieves state-of-the-art results: across 36 audio and audio-visual benchmarks, it hits open-source SOTA on 32 and overall SOTA on 22, outperforming or matching strong closed-source models such as Gemini-2.5 Pro and GPT-4o. To reduce latency, especially in audio/video streaming, Talker predicts discrete speech codecs via a multi-codebook scheme and replaces heavier diffusion approaches.

Downloads: 1 This Week

Last Update: 2026-04-23

See Project

MiniMind-O

A 0.1B Omni model trained from scratch

MiniMind-O is an educational open-source project for building a small end-to-end Omni model from scratch. It extends the MiniMind family by exploring a model that can handle text, audio, and image inputs while producing text and streaming speech outputs. The project is designed to make multimodal AI training more accessible by keeping the model size small enough for ordinary personal hardware. It includes both mini and full training data paths, allowing learners to run a complete workflow quickly or reproduce the released model setup more closely. ...

Downloads: 3 This Week

Last Update: 3 days ago

See Project

Anthropic SDK Python

Provides convenient access to the Anthropic REST API from any Python 3

...The library includes definitions for all request and response parameters using Python typed objects, automatically handles serialization and deserialization, and wraps HTTP logic (timeouts, retries, error mapping) so that developers can call the API in a clean, high-level way. The SDK supports both synchronous and asynchronous usage (via async/await) depending on context. Importantly, it also supports streaming responses via Server-Sent Events (SSE) so that large outputs can be consumed incrementally rather than waiting for the full response. The client offers helper abstractions for tools (function-style “tools”) and streaming utilities for building interactive agents.

Downloads: 4 This Week

Last Update: 2 days ago

See Project

MiniCPM-o

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

...Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports both text and audio inputs to generate outputs in various forms, including voice cloning, emotion control, and interactive role-playing.

Downloads: 0 This Week

Last Update: 2025-05-15

See Project

GLM-TTS

Controllable & emotion-expressive zero-shot TTS

GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice...

Downloads: 0 This Week

Last Update: 2026-04-10

See Project

Demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

...Demucs supports GPU-accelerated inference and can process multi-channel audio with chunked streaming for real-time or batch operation. It also provides training scripts and utilities to fine-tune on custom datasets, along with remixing and enhancement tools.

Downloads: 87 This Week

Last Update: 2025-10-12

See Project

Search Results for "audio streaming server"

Showing 8 open source projects for "audio streaming server"

MOSS-TTS Family

Qwen2.5-Omni

Qwen3-Omni

MiniMind-O

Anthropic SDK Python

MiniCPM-o

GLM-TTS

Demucs

Search Results for "audio streaming server"

Showing 8 open source projects for "audio streaming server"

MOSS-TTS Family

Qwen2.5-Omni

Qwen3-Omni

MiniMind-O

Anthropic SDK Python

MiniCPM-o

GLM-TTS

Demucs

Related Searches

Related Categories