Speech-to-text, text-to-speech, and speaker recognition
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Play ChatGPT and other LLM with Xiaomi AI Speaker
Long-form streaming TTS system for multi-speaker dialogue generation
Automatic Speech Recognition with Word-level Timestamps
Interface for OuteTTS models
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Open-source multi-speaker long-form text-to-speech model
Self-hosted AI audio transcription
Official PyTorch Implementation
An Open Source implementation of Notebook LM with more flexibility
super expressive prompting model based on ltx2.3
A Web UI for easy subtitle using whisper model
MOSS‑TTS Family open‑source speech and sound generation model
A private, local meeting notes assistant
A generative speech model for daily dialogue
A PyTorch-based Speech Toolkit
Instantly generate AI-powered subtitles on your device
One-click deployment (including offline integration package)
Multi-modal large language model designed for audio understanding
Instant voice cloning by MIT and MyShell. Audio foundation model
High-Quality Voice Cloning TTS for 600+ Languages
Robust Speech Recognition Across Languages, Dialects
Web presentation editor replicating many PowerPoint features online