TTS model capable of streaming conversational audio in realtime
Use Microsoft Edge's online text-to-speech service from Python
Open-source framework for intelligent speech interaction
Robust Speech Recognition via Large-Scale Weak Supervision
Management of Yandex Station and other smart home devices
The official Python library for the Fish Audio API
Offline inference engine for art, real-time voice conversations
Repo of Qwen2-Audio chat & pretrained large audio language model
SOTA Open Source TTS
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
SoTA open-source TTS
Flowly is 100x faster than OpenClaw
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Context-aware desktop AI assistant that understands screen content
Voice Recognition to Text Tool
Framework for building AI-powered interactive digital humans and agent
An Open Source text-to-speech system built by inverting Whisper
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
A nearly-live implementation of OpenAI's Whisper
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
World's first open-source, agentic video production system
A Web UI for easy subtitle using whisper model
Converts text to speech in realtime
Long-form streaming TTS system for multi-speaker dialogue generation
Open Source Speech Language Model