Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Text to Speech Software
Search Results

Search Results for "multi%20seat"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 31
Linux 30
Mac 30
More...
BSD 18
ChromeOS 18

Category

Artificial Intelligence 31
Multimedia 1

License

OSI-Approved Open Source 30

Translations

German 1

Programming Language

Python 31

Status

Production/Stable 1

Showing 31 open source projects for "multi%20seat"

View related business solutions

Text to Speech Python Clear Filters & Widen Search

Atera all-in-one platform IT management software with AI agents
Ideal for internal IT departments or managed service providers (MSPs)

Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.

Learn More
All-in-one security tool helps you prevent ransomware and breaches.
SIEM + Detection and Response for IT Teams

Blumira’s detection and response platform enables faster resolution of threats to help you stop ransomware attacks and prevent data breaches. We surface real threats, providing meaningful findings so you know what to prioritize. With our 3-step rapid response, you can automatically block known threats, use our playbooks for easy remediation, or contact our security team for additional guidance. Our responsive security team helps with onboarding, triage and ongoing consultations to continuously help your organization improve your security coverage.

Learn More
1

Bert-VITS2

VITS2 backbone with multilingual-bert

...The repository includes everything needed to train, fine-tune, and run the model, from configuration files to preprocessing scripts, spectrogram utilities, and training entrypoints for multi-GPU and multi-node setups. It provides emotional modeling through “emo embeddings,” allowing voices to be conditioned on different affective states during synthesis. Releases include optimizations for Japanese and English alignment, expanded training data, spec caching and pre-generation tools, as well as ONNX export for more lightweight inference deployments.

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
2

VibeVoice ComfyUI

ComfyUI integration for Microsoft's VibeVoice text-to-speech model

VibeVoice ComfyUI is a comprehensive wrapper that integrates Microsoft’s VibeVoice text-to-speech models directly into ComfyUI workflows. It exposes VibeVoice as a set of custom nodes so you can build single-speaker and multi-speaker voice generation pipelines visually, combining TTS with other audio or generative components. The integration supports high-quality single-speaker synthesis as well as scripted multi-speaker conversations, with optional voice cloning from audio samples for each speaker. It includes advanced control over generation parameters like attention backend, diffusion steps, sampling temperature, guidance scale, and quantization settings, allowing users to tune the trade-offs between quality, VRAM usage, and speed. ...

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
3

GLM-TTS

Controllable & emotion-expressive zero-shot TTS

...It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. The system introduces a multi-reward reinforcement learning framework that jointly optimizes for voice similarity, emotional expressiveness, pronunciation, and intelligibility, yielding output that can rival commercial options in naturalness and expressiveness. GLM-TTS also supports phoneme-level control and hybrid text + phoneme input, giving developers precise control over pronunciation critical for multilingual or polyphone-rich languages.

Downloads: 5 This Week

Last Update: 2026-01-20
See Project
4

StyleTTS 2

Towards Human-Level Text-to-Speech through Style Diffusion

...The architecture uses a two-stage training process and leverages an auxiliary speech language model to guide generation toward more natural and coherent utterances. StyleTTS2 supports both single-speaker and multi-speaker configurations, with the ability to sample or transfer styles from reference audio, making it powerful for expressive TTS and character voices. The repository includes training scripts, configuration files, and pre-trained auxiliary modules such as a text aligner, pitch extractor, and PL-BERT-based linguistic encoder.

Downloads: 4 This Week

Last Update: 2025-11-28
See Project
Total Network Visibility for Network Engineers and IT Managers
Network monitoring and troubleshooting is hard. TotalView makes it easy.

This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.

Learn More
5

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai

MeloTTS is an open-source text-to-speech (TTS) system that generates natural-sounding speech from text input. It utilizes advanced machine-learning models to produce high-quality audio outputs.

Downloads: 4 This Week

Last Update: 2025-01-06
See Project
6

FastKoko

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model

FastKoko is a self-hosted text-to-speech server built around the Kokoro-82M model and exposed through a FastAPI backend. It is designed to be easy to deploy via Docker, with separate CPU and GPU images so that users can choose between pure CPU inference and NVIDIA GPU acceleration. The project exposes an OpenAI-compatible speech endpoint, which means existing code that talks to the OpenAI audio API can often be pointed at a Kokoro-FastAPI instance with minimal changes. It supports multiple...

Downloads: 1 This Week

Last Update: 2025-12-13
See Project
7

NVIDIA NeMo

Toolkit for conversational AI

...Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI architectures are typically large and require a lot of data and compute for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node mixed-precision training. Supported models: Jasper, QuartzNet, CitriNet, Conformer-CTC, Conformer-Transducer, Squeezeformer-CTC, Squeezeformer-Transducer, ContextNet, LSTM-Transducer (RNNT), LSTM-CTC. NGC collection of pre-trained speech processing models.

Downloads: 1 This Week

Last Update: 2026-01-09
See Project
8

IndexTTS2

Industrial-level controllable zero-shot text-to-speech system

...It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice cloning — meaning it can mimic a target speaker’s voice from a short reference sample — making it versatile for multi-voice uses. Compared to many open-source TTS tools, IndexTTS emphasizes efficiency and controllability: it offers faster inference, simpler training pipelines, and controllable speech parameters (like duration, pitch, and prosody), which is critical for production use.

Downloads: 11 This Week

Last Update: 2025-11-27
See Project
9

MLX-Audio

A text-to-speech, speech-to-text and speech-to-speech library

MLX-Audio is a speech library built on Apple’s MLX framework and optimized for Apple Silicon machines (M-series Macs). It focuses on text-to-speech and speech-to-speech workflows, with APIs and a command-line interface that make it easy to generate high-quality audio from text. Because it uses MLX and targets Apple Silicon, inference is fast and can take advantage of hardware acceleration and quantization for efficient on-device performance. The project provides a straightforward CLI...

Downloads: 10 This Week

Last Update: 3 days ago
See Project
Reach Your Audience with Rise Vision, the #1 Cloud Digital Signage Software Solution
K-12 Schools, Higher Education, Businesses, Restaurants

Rise Vision is the #1 digital signage company, offering easy-to-use cloud digital signage software compatible with any player across multiple screens. Forget about static displays. Save time and boost sales with 500+ customizable content templates for your screens. If you ever need help, get free training and exceptionally fast support.

Learn More
10

Matcha-TTS

A fast TTS architecture with conditional flow matching

...Users can train on standard datasets like LJSpeech or plug in their own corpora, with helper tools for computing dataset statistics, extracting phoneme durations, and running multi-GPU training.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
11

ChatTTS_colab

One-click deployment (including offline integration package)

...A distinctive feature is the “voice gacha” system, which batch-generates many distinct voice timbres and allows users to save the ones they like into a curated voice library. It has first-class support for long-form audio generation, making it suitable for audiobooks, podcasts, or long narration tasks. The project also implements multi-speaker or role-based reading, letting users assign different voices to different characters in a script and even use a large language model to generate that script in one step.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
12

Style-Bert-VITS2

Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles

Style-Bert-VITS2 is a text-to-speech system based on Bert-VITS2 that focuses on highly controllable voice styles and emotional expression. It takes the original Bert-VITS2 v2.1 and its Japanese-Extra variant and extends them so you can control emotion and speaking style with fine-grained intensity, not just choose a generic tone. The project targets both power users and beginners: Windows users without Git or Python can install and run it using bundled .bat scripts, while advanced users can...

Downloads: 9 This Week

Last Update: 2025-11-28
See Project
13

NVIDIA NeMo Framework

Scalable generative AI framework built for researchers and developers

NVIDIA NeMo is a scalable, cloud-native generative AI framework aimed at researchers and PyTorch developers working on large language models, multimodal models, and speech AI (ASR and TTS), with growing support for computer vision. It provides collections of domain-specific modules and reference implementations that make it easier to pre-train, fine-tune, and deploy very large models on multi-GPU and multi-node infrastructure. NeMo 2.0 introduces a Python-based configuration system, replacing YAML with more flexible, programmable configs that can be versioned and composed for different experiments. The framework builds on PyTorch Lightning–style modular abstractions, so training scripts are composed from reusable components for data loading, models, optimizers, and schedulers, which simplifies experimentation and adaptation. ...

Downloads: 0 This Week

Last Update: 2026-01-09
See Project
14

ElevenLabs Python

The official Python SDK for the ElevenLabs API

elevenlabs-python is the official Python SDK for the ElevenLabs API, giving developers a convenient way to access ElevenLabs’ high-quality, lifelike voices. The library wraps the HTTP API into a typed Python client, so you can perform text-to-speech, streaming, voice cloning, voice management, and agents-related operations with simple method calls. It exposes ElevenLabs’ main models such as Eleven Multilingual v2, Eleven Flash v2.5, and Eleven Turbo v2.5, each targeting different trade-offs...

Downloads: 5 This Week

Last Update: 2 days ago
See Project
15

OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model

OpenVoice is a versatile instant voice cloning system that can replicate a speaker’s tone color from just a short audio clip and then generate speech in multiple languages. It is designed not only to match the timbre of the reference voice, but also to give granular control over style parameters such as emotion, accent, rhythm, pauses, and intonation. The model supports cross-lingual and even zero-shot cross-lingual voice cloning, so a speaker recorded in one language can be made to speak...

Downloads: 6 This Week

Last Update: 2025-11-28
See Project
16

CosyVoice

Multi-lingual large voice generation model, providing inference

CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech...

Downloads: 4 This Week

Last Update: 2025-11-30
See Project
17

Luna AI

Virtual AI anchor that combines state-of-the-art technology

Luna AI is a virtual AI streamer framework designed to power an interactive VTuber that can go live on major platforms and chat with viewers in real time. It is built around a core assistant persona called “Luna AI,” which can be driven by a wide range of large language models and platforms, including GPT-style APIs, Claude, LangChain-based backends, ChatGLM, Kimi, Ollama, and many others. The project supports multiple rendering backends for the avatar, such as Live2D, Unreal Engine (UE),...

Downloads: 3 This Week

Last Update: 2025-11-28
See Project
18

RealtimeTTS

Converts text to speech in realtime

RealtimeTTS is a low-latency text-to-speech library built for real-time applications such as voice chat with LLMs, assistants, and interactive tools. It is designed around a streaming model: you can feed it text incrementally (for example, as an LLM responds) and get audio output almost immediately, which keeps end-to-end latency very low. The library is engine-agnostic and plugs into a wide range of cloud and local TTS systems, including OpenAI, ElevenLabs, Azure, Coqui, Piper, StyleTTS2,...

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
19

Lingvo

Framework for building neural networks

...The framework provides a structured way to define models, input pipelines, and training configurations using a common interface for layers, which encourages reuse across different tasks. It has been used to implement state of the art architectures such as recurrent neural networks, Transformer models, variational autoencoder hybrids, and multi task systems. Lingvo includes reference models and configurations for domains like machine translation, automatic speech recognition, language modeling, image understanding, and 3D object detection. Centralized hyperparameter configuration files allow researchers to share exact experiment setups so others can retrain and compare results reliably.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
20

WavTokenizer

SOTA discrete acoustic codec models with 40/75 tokens per second

...The model uses a single-quantizer design together with temporal compression to achieve extreme compression without sacrificing reconstruction fidelity. Its architecture incorporates a broader vector-quantization space, extended contextual windows, and improved attention networks, combined with multi-scale discriminators and inverse Fourier transform blocks to enhance waveform reconstruction. Extensive experiments show that WavTokenizer matches or surpasses previous neural codecs across speech, music, and general audio on both objective metrics and subjective listening tests.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
21

WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper

...The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data. Its architecture follows a token-based, multi-stage pipeline inspired by AudioLM and SPEAR-TTS: Whisper is used to produce semantic tokens, EnCodec compresses the waveform into acoustic tokens, and Vocos reconstructs high-fidelity audio from those tokens. The repository includes notebooks and scripts for inference, long-form synthesis, and finetuning, as well as pre-trained models and converted datasets hosted on Hugging Face. ...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
22

OuteTTS

Interface for OuteTTS models

OuteTTS is an interface library for running OuteTTS text-to-speech models across a range of backends, making it easier to deploy the same model on different hardware and runtimes. It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines. The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
23

EmotiVoice

Multi-Voice and Prompt-Controlled TTS Engine

EmotiVoice is a multi-voice, prompt-controlled text-to-speech engine designed to generate highly expressive speech across thousands of voices. It supports both English and Chinese and ships with over 2,000 preset voices, making it suitable for everything from characters and virtual anchors to narration and dialogue. The core idea is prompt-based emotional and style control: you can ask the engine to speak “happy,” “sad,” “excited,” or with other high-level style prompts that shape prosody, pitch, speed, and energy. ...

Downloads: 8 This Week

Last Update: 2025-11-30
See Project
24

Parallel WaveGAN

Unofficial Parallel WaveGAN

Parallel WaveGAN is an unofficial PyTorch implementation of several state-of-the-art non-autoregressive neural vocoders, centered on Parallel WaveGAN but also including MelGAN, Multiband-MelGAN, HiFi-GAN, and StyleMelGAN. Its main goal is to provide a real-time neural vocoder that can turn mel spectrograms into high-quality speech audio efficiently. The repository is designed to work hand-in-hand with ESPnet-TTS and NVIDIA Tacotron2-style front ends, so you can build complete TTS or singing...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
25

vits_chinese

Best practice TTS based on BERT and VITS

vits_chinese is an implementation of the VITS end-to-end text-to-speech (TTS) architecture tailored for Chinese (and possibly multilingual) speech synthesis. VITS is a model combining variational autoencoders (VAEs), normalizing flows, adversarial learning, and a stochastic duration predictor — a design that enables generation of natural, expressive speech, capturing variations in rhythm and prosody. By customizing or porting VITS for Chinese, this project aims to produce high-quality TTS...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project

Previous
You're on page 1
2
Next

Related Searches

ai

nvidia

audio visualization vlc

voice cloning

tts

text to speech

speech synthesis source code

php text to speech

melotts

marathi text to speech

Related Categories

Artificial Intelligence

Multimedia

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: