A Web UI for easy subtitle using whisper model
AI tool converting video/audio into structured documents instantly
Automatic Speech Recognition with Word-level Timestamps
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Robust Speech Recognition via Large-Scale Weak Supervision
An Open Source implementation of Notebook LM with more flexibility
Label Studio is a multi-type data labeling and annotation tool
Offline Text To Speech synthesis for python
Use Microsoft Edge's online text-to-speech service from Python
Generate blog articles from video or audio
One-click deployment (including offline integration package)
EPUB to audiobook converter, optimized for Audiobookshelf
Converts text to speech in realtime
Unified web UI for training and running open models locally
MARS5 speech model (TTS) from CAMB.AI
Multimodal-Driven Architecture for Customized Video Generation
Open source AI model for generating full songs from lyrics prompts
Controllable & emotion-expressive zero-shot TTS
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
The most powerful and modular diffusion model GUI, api and backend
A general fine-tuning kit geared toward image/video/audio diffusion
A fast TTS architecture with conditional flow matching
A Systematic Framework for Interactive World Modeling
Open source AI wearable platform for recording and summarizing speech
A TTS model capable of generating ultra-realistic dialogue