Speech-to-text, text-to-speech, and speaker recognition
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Long-form streaming TTS system for multi-speaker dialogue generation
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Interface for OuteTTS models
Open-source multi-speaker long-form text-to-speech model
Clone a voice in 5 seconds to generate arbitrary speech in real-time
The HTML Presentation Framework
Automatic Speech Recognition with Word-level Timestamps
A LaTeX class for producing presentations and slides
A private, local meeting notes assistant
Official PyTorch Implementation
An Open Source implementation of Notebook LM with more flexibility
Self-hosted AI audio transcription
A Web UI for easy subtitle using whisper model
The official KotlinConf application
The ioquake3 community effort to continue supporting/developing id's
macOS System-wide audio equalizer & volume mixer
A generative speech model for daily dialogue
One-click deployment (including offline integration package)
A PyTorch-based Speech Toolkit
Translate the video from one language to another and embed dubbing
Offline speech recognition API for Android, iOS, Raspberry Pi
Instantly generate AI-powered subtitles on your device
Multi-modal large language model designed for audio understanding