stack free download - SourceForge

Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models

...Because it’s part of the broader Qwen ecosystem, it benefits from the model’s understanding of linguistic nuances, enabling more accurate pronunciation, prosody, and contextual delivery than many traditional TTS systems. Developers can customize voice output parameters like speed, pitch, and volume, and combine the TTS stack with other AI components.

Downloads: 27 This Week

Last Update: 7 days ago

See Project

Speech-AI-Forge

Speech-AI-Forge is a project developed around TTS generation model

Speech-AI-Forge is a full-stack project built around modern text-to-speech generation models, providing both an API server and a Gradio-based web UI for interactive use. At its core, it acts as a hub that wires together multiple speech-related capabilities, including TTS, speech-to-text and LLM-based control flows, behind a consistent interface. The system is designed to be deployed in several ways: you can try it online via hosted demos, spin it up in a one-click Colab environment, run it in Docker containers, or set it up locally with its environment preparation scripts. ...

Downloads: 1 This Week

Last Update: 8 hours ago

See Project

EasyVoice

Open source text-to-speech tool, supports extra-long text

...It offers streaming playback so audio starts almost immediately, even for very long inputs, and automatically generates subtitle files suitable for video production or translation workflows. Under the hood, easyVoice uses a modern stack with Vue 3 and Element Plus on the front end, Node.js and Express on the back end, and TTS engines such as Microsoft Azure TTS and OpenAI-compatible APIs, orchestrated through ffmpeg.

Downloads: 7 This Week

Last Update: 2026-01-26

See Project

CosyVoice

Multi-lingual large voice generation model, providing inference

CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech across languages and in code-switching contexts. ...

Downloads: 4 This Week

Last Update: 2025-11-30

See Project

AI Runner

Offline inference engine for art, real-time voice conversations

...It is implemented as a desktop-oriented Python application and emphasizes privacy and self-hosting, allowing users to work with text-to-speech, speech-to-text, text-to-image and multimodal models without sending data to external services. At the core of its LLM stack is a mode-based architecture with specialized “modes” such as Author, Code, Research, QA and General, and a workflow manager that automatically routes user requests to the right agent based on the task. The project has a strong focus on developer ergonomics, with thorough development guidelines, environment configuration using .env variables, and a clear structure for tests, tools and agents.

Downloads: 3 This Week

Last Update: 2025-12-11

See Project

Amphion

Toolkit for audio, music, and speech generation

...Built on the broader OpenMMLab ecosystem, Amphion follows modular design patterns and configuration systems similar to other OpenMMLab projects, easing adoption for users who are already familiar with that stack.

Downloads: 1 This Week

Last Update: 2025-11-28

See Project

Matcha-TTS

A fast TTS architecture with conditional flow matching

...The model is fully probabilistic, so it can generate diverse realizations of the same text while still sounding stable and intelligible. The repository provides an end-to-end TTS pipeline: a PyTorch/Lightning training stack, configuration files, pre-trained checkpoints, a command-line interface, and a Gradio app for interactive testing. Users can train on standard datasets like LJSpeech or plug in their own corpora, with helper tools for computing dataset statistics, extracting phoneme durations, and running multi-GPU training.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

Bailing

Bailing is a voice dialogue robot similar to GPT-4o

Bailing is an open-source voice-dialogue assistant designed to deliver natural voice-based conversations by combining automatic speech recognition (ASR), voice activity detection (VAD), a large language model (LLM), and text-to-speech (TTS) in a single pipeline. Its goal is to offer a “voice-first” chat experience similar to what one might expect from a system like GPT-4o, but fully open and deployable by users. The project is modular: each core function — ASR, VAD, LLM, TTS — exists as a...

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

NVIDIA NeMo Framework

Scalable generative AI framework built for researchers and developers

NVIDIA NeMo is a scalable, cloud-native generative AI framework aimed at researchers and PyTorch developers working on large language models, multimodal models, and speech AI (ASR and TTS), with growing support for computer vision. It provides collections of domain-specific modules and reference implementations that make it easier to pre-train, fine-tune, and deploy very large models on multi-GPU and multi-node infrastructure. NeMo 2.0 introduces a Python-based configuration system,...

Downloads: 0 This Week

Last Update: 2026-01-09

See Project

VideoChat

Real-time voice interactive digital human

...The system is customizable: you can define your own avatar appearance and voice, and it supports voice cloning so you can generate a new voice from a short 3–10 second reference sample. The tech stack integrates FunASR for speech recognition, Qwen for language understanding, multiple TTS engines like GPT-SoVITS, CosyVoice, or edge-tts, and MuseTalk for talking-head generation.

Downloads: 0 This Week

Last Update: 2025-12-18

See Project

WaveRNN

WaveRNN Vocoder + TTS

WaveRNN is a PyTorch implementation of DeepMind’s WaveRNN vocoder, bundled with a Tacotron-style TTS front end to form a complete text-to-speech stack. As a vocoder, WaveRNN models raw audio with a compact recurrent neural network that can generate high-quality waveforms more efficiently than many traditional autoregressive models. The repository includes scripts and code for preprocessing datasets such as LJSpeech, training Tacotron to produce mel spectrograms, training WaveRNN on those spectrograms (with optional GTA data), and finally generating audio. ...

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

Search Results for "stack"

Showing 11 open source projects for "stack"

Qwen3-TTS

Speech-AI-Forge

EasyVoice

CosyVoice

AI Runner

Amphion

Matcha-TTS

Bailing

NVIDIA NeMo Framework

VideoChat

WaveRNN

Search Results for "stack"

Showing 11 open source projects for "stack"

Qwen3-TTS

Speech-AI-Forge

EasyVoice

CosyVoice

AI Runner

Amphion

Matcha-TTS

Bailing

NVIDIA NeMo Framework

VideoChat

WaveRNN

Related Searches

Related Categories