Robust Speech Recognition via Large-Scale Weak Supervision
Interface for OuteTTS models
Unofficial Python API and agentic skill for Google NotebookLM
Offline Text To Speech synthesis for python
Framework for building real-time voice and multimodal AI agents
Automatic Speech Recognition with Word-level Timestamps
Converts text to speech in realtime
Use Microsoft Edge's online text-to-speech service from Python
Generate blog articles from video or audio
Sample code and notebooks for Generative AI on Google Cloud
Free, high-quality text-to-speech API endpoint to replace OpenAI
Voice Recognition to Text Tool
Multimodal-Driven Architecture for Customized Video Generation
MARS5 speech model (TTS) from CAMB.AI
Label Studio is a multi-type data labeling and annotation tool
Unified web UI for training and running open models locally
A fast TTS architecture with conditional flow matching
Build multimodal language agents for fast prototype and production
A high-quality rapid TTS voice cloning model
The most powerful and modular diffusion model GUI, api and backend
Controllable & emotion-expressive zero-shot TTS
An Open Source implementation of Notebook LM with more flexibility
A python tool that uses GPT-4, FFmpeg, and OpenCV
Open source AI wearable platform for recording and summarizing speech
Automatically translates the text of a video based on a subtitle file