Speech-to-text, text-to-speech, and speaker recognition
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Interface for OuteTTS models
Open-source multi-speaker long-form text-to-speech model
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Automatic Speech Recognition with Word-level Timestamps
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Self-hosted AI audio transcription
A private, local meeting notes assistant
A Web UI for easy subtitle using whisper model
One-click deployment (including offline integration package)
An Open Source implementation of Notebook LM with more flexibility
Synchronized Translation for Videos
Instant voice cloning by MIT and MyShell. Audio foundation model
MARS5 speech model (TTS) from CAMB.AI
Web presentation editor replicating many PowerPoint features online
High-Quality Voice Cloning TTS for 600+ Languages
Instantly generate AI-powered subtitles on your device
Foundational model for human-like, expressive TTS
Towards Human-Level Text-to-Speech through Style Diffusion
A deep learning toolkit for Text-to-Speech, battle-tested in research
App in java for chatting to a generative A.I. (involving tts and stt)
Best practice TTS based on BERT and VITS
Open source implementation of Microsoft's VALL-E X zero-shot TTS model