text-speaker free download

Showing 13992 open source projects for "text-speaker"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

sherpa-onnx

Speech-to-text, text-to-speech, and speaker recognition

Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.

Downloads: 204 This Week

Last Update: 6 days ago
See Project
2

VibeVoice ComfyUI

ComfyUI integration for Microsoft's VibeVoice text-to-speech model

VibeVoice ComfyUI is a comprehensive wrapper that integrates Microsoft’s VibeVoice text-to-speech models directly into ComfyUI workflows. It exposes VibeVoice as a set of custom nodes so you can build single-speaker and multi-speaker voice generation pipelines visually, combining TTS with other audio or generative components. The integration supports high-quality single-speaker synthesis as well as scripted multi-speaker conversations, with optional voice cloning from audio samples for each speaker.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
3

FireRedTTS-2

Long-form streaming TTS system for multi-speaker dialogue generation

FireRedTTS2 is a next-generation open-source text-to-speech (TTS) system focused on long-form, streaming speech synthesis for multi-speaker dialogue, delivering stable natural speech with context-aware prosody and reliable speaker transitions that support real-time and conversational applications. It features a specialized streaming speech tokenizer and a dual-transformer architecture that enables low latency and high-quality synthesis, making it suitable for interactive systems like chatbots, podcasts, and applications where dynamic turn-taking between speakers is essential. ...

Downloads: 2 This Week

Last Update: 2026-02-16
See Project
4

VoxCPM

TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

...Trained on a large 1.8-million-hour bilingual corpus, VoxCPM can infer appropriate speaking style from context, dynamically adjusting intonation, rhythm, and emotional tone. It supports zero-shot voice cloning from a short reference audio clip, capturing timbre, accent, and pacing to closely mimic a target speaker without per-speaker fine-tuning.

Downloads: 46 This Week

Last Update: 2026-04-08
See Project
Build Securely on Azure with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
5

Real-Time Voice Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder. In the first stage, short audio clips are converted into a fixed-dimensional speaker embedding that captures voice characteristics; this embedding is then used by a Tacotron-style synthesizer to generate spectrograms from text, which a WaveRNN-based vocoder finally turns into audio. ...

Downloads: 13 This Week

Last Update: 2026-03-09
See Project
6

OuteTTS

Interface for OuteTTS models

OuteTTS is an interface library for running OuteTTS text-to-speech models across a range of backends, making it easier to deploy the same model on different hardware and runtimes. It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
7

Text Editors

Sempare Template (scripting) Engine for Delphi

Sempare Template (scripting) Engine for Delphi allows for flexible dynamic text generation. It can be used for generating email, HTML, reports, source code, xml, configuration, etc.

Downloads: 16 This Week

Last Update: 2025-05-08
See Project
8

WhisperX

Automatic Speech Recognition with Word-level Timestamps

...The system enables batched inference, significantly increasing transcription speed while maintaining high accuracy. It is particularly effective for long recordings, where traditional approaches may suffer from drift, repetition, or misalignment. whisperx also supports speaker diarization, allowing identification of different speakers within a conversation. Its architecture combines multiple components to enhance both performance and usability in real-world transcription tasks. Overall, whisperx provides a more robust and scalable solution for high-quality speech-to-text applications.

Downloads: 19 This Week

Last Update: 2026-04-06
See Project
9

VibeVoice

Open-source multi-speaker long-form text-to-speech model

VibeVoice-1.5B is Microsoft’s frontier open-source text-to-speech (TTS) model designed for generating expressive, long-form, multi-speaker conversational audio such as podcasts. Unlike traditional TTS systems, it excels in scalability, speaker consistency, and natural turn-taking for up to 90 minutes of continuous speech with as many as four distinct speakers. A key innovation is its use of continuous acoustic and semantic speech tokenizers operating at an ultra-low frame rate of 7.5 Hz, enabling high audio fidelity with efficient processing of long sequences. ...

Downloads: 20 This Week

Last Update: 11 hours ago
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
10

Nextcloud Text

Collaborative document editing using Markdown

Nextcloud Text is a collaborative document editor that lets you work, share and collaborate with friends and colleagues on documents. Though it is available in Nextcloud 16 and 17, anybody can access Text whether they’re using Nextcloud or not. Nextcloud Text files are saved as Markdown, so they can be edited from any other text app. Nextcloud Text is lightweight and distraction-free, giving you only the formatting that you need so you can focus on writing.

Downloads: 4 This Week

Last Update: 13 hours ago
See Project
11

ChatTTS

A generative speech model for daily dialogue

ChatTTS is an open-source conversational text-to-speech model optimized for dialogue, developed by 2Noise. Trained on 100,000+ hours of English and Chinese conversation data, it excels at generating expressive prosody—pauses, interjections, laughter—for more natural-sounding speech synthesis in assistant and chatbot applications.

Downloads: 4 This Week

Last Update: 2026-04-10
See Project
12

Recognizers-Text

Recognition and resolution of numbers, units, date/time, etc.

Recognizers-Text is a multilingual text recognition library that extracts structured information such as dates, numbers, and currency values from unstructured text.

Downloads: 0 This Week

Last Update: 2025-02-12
See Project
13

Whisper-WebUI

A Web UI for easy subtitle using whisper model

...Whisper WebUI also includes advanced preprocessing and postprocessing features such as voice activity detection, background music separation, and speaker diarization, enabling more accurate and structured outputs.

Downloads: 3 This Week

Last Update: 2026-03-18
See Project
14

Open Notebook

An Open Source implementation of Notebook LM with more flexibility

...The platform supports 16+ AI providers—including OpenAI, Anthropic, Ollama, Google, and LM Studio—allowing flexible model choice and cost optimization. Open Notebook enables users to organize and analyze multi-modal content such as PDFs, videos, audio files, web pages, and Office documents. It combines full-text and vector search with context-aware AI chat to deliver insights grounded in your own research materials. With advanced features like multi-speaker podcast generation, customizable content transformations, and a comprehensive REST API, Open Notebook provides a powerful and extensible research environment.

Downloads: 37 This Week

Last Update: 4 days ago
See Project
15

Scriberr

Self-hosted AI audio transcription

Scriberr is a self-hosted AI-powered transcription platform designed to convert audio and video into highly accurate text while prioritizing privacy and local processing. Unlike cloud-based transcription services, Scriberr runs entirely on the user’s machine, ensuring that sensitive recordings are never sent to third-party servers and remain fully under user control. It leverages modern speech recognition models such as Whisper and other advanced architectures to deliver precise transcripts with word-level timing and speaker identification. ...

Downloads: 5 This Week

Last Update: 2026-03-19
See Project
16

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API

text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models.

Downloads: 2 This Week

Last Update: 2026-03-05
See Project
17

MegaTTS 3

Official PyTorch Implementation

...Given a reference audio sample (and corresponding latent representation), MegaTTS3 can generate speech in the style and voice timbre of that speaker — useful for personalized TTS, voice-overs, dubbing, or multi-speaker applications. The system supports both Chinese and English (with code-switching), making it versatile across languages, and offers controls for accent strength, voice similarity, intelligibility vs. similarity tradeoffs, and other speech parameters to fine-tune output.

Downloads: 0 This Week

Last Update: 2025-12-02
See Project
18

pyVideoTrans

Translate the video from one language to another and embed dubbing

pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. The tool supports both command-line and GUI modes, making it accessible to developers and creatives needing batch or automated processing.

Downloads: 19 This Week

Last Update: 2026-04-14
See Project
19

Text Generation Web UI

Oobabooga - The definitive Web UI for local AI, with powerful features

...Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS). Very efficient text streaming. Parameter presets, 8-bit mode. Layers splitting across GPU(s), CPU, and disk. CPU mode, FlexGen, DeepSpeed ZeRO-3, API with streaming and without streaming. LLaMA model, including 4-bit GPTQ. RWKV model, LoRA (loading and training), Softprompts, and extensions.

Downloads: 27 This Week

Last Update: 14 hours ago
See Project
20

Kilo

A text editor in less than 1000 LOC with syntax highlight and search

Kilo is a minimalistic terminal text editor written in C, famous for fitting its full implementation into fewer than 1,000 lines of code in a single source file. It was created by Salvatore Sanfilippo (antirez, also known for Redis) as an exercise in writing a small, self-contained editor that others can study and extend. Despite its tiny size, Kilo supports core editor features like opening and saving files, incremental search, and basic syntax highlighting.

Downloads: 2 This Week

Last Update: 2025-11-25
See Project
21

Slate Text Editor framework

Completely customizable framework for building rich text editors

you can do things like turn a selection of text bold, or add a semantically rendered block quote in the middle of the page. The most important part of Slate is that plugins are first-class entities. That means you can completely customize the editing experience, to build complex editors like Medium's or Dropbox's, without having to fight against the library's assumptions. Slate's core logic assumes very little about the schema of the data you'll be editing, which means that there are no assumptions baked into the library that'll trip you up when you need to go beyond the most basic use cases. ...

Downloads: 2 This Week

Last Update: 2026-04-12
See Project
22

OmniVoice

High-Quality Voice Cloning TTS for 600+ Languages

The OmniVoice project is a cutting-edge multilingual text-to-speech system designed to generate high-quality speech across more than 600 languages. Built on a diffusion language model-style architecture, it combines scalability with strong performance, enabling both natural-sounding voice synthesis and efficient inference speeds. One of its most notable capabilities is zero-shot voice cloning, allowing users to replicate a speaker’s voice using only a short reference audio clip. ...

Downloads: 5 This Week

Last Update: 2026-04-15
See Project
23

ChatTTS_colab

One-click deployment (including offline integration package)

...It has first-class support for long-form audio generation, making it suitable for audiobooks, podcasts, or long narration tasks. The project also implements multi-speaker or role-based reading, letting users assign different voices to different characters in a script and even use a large language model to generate that script in one step.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
24

Spark TTS

Spark-TTS Inference Code

Spark TTS is an open-source, PyTorch-based text-to-speech inference system that leverages large language models to produce highly natural, intelligible speech from text input. It uses an efficient single-stream architecture where speech tokens are directly reconstructed from the predictions of an LLM, removing the need for external acoustic models or complex vocoders and making the generation pipeline cleaner and faster.

Downloads: 2 This Week

Last Update: 2026-02-04
See Project
25

Text Embeddings Inference

High-performance inference server for text embeddings models API layer

Text Embeddings Inference is a high-performance server designed to serve text embedding models efficiently in production environments. It focuses on delivering fast and scalable embedding generation by leveraging optimized inference techniques and modern hardware acceleration. It is built to support transformer-based embedding models, making it suitable for tasks such as semantic search, clustering, and retrieval-augmented systems.

Downloads: 0 This Week

Last Update: 2026-03-23
See Project