Automatic Speech Recognition with Word-level Timestamps
Official MiniMax Model Context Protocol (MCP) server
Qwen3-omni is a natively end-to-end, omni-modal LLM
Full git and GitHub integration with Sublime Text
Framework for building realtime multimodal voice AI agents apps
Easily compute clip embeddings and build a clip retrieval system
State-of-the-art TTS model under 25MB
Voice Recognition to Text Tool
An open-source toolkit for monitoring Language Learning Models (LLMs)
Implementation of Video Diffusion Models
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Python bindings for MuPDF's rendering library.
Accurate × Fast × Comprehensive
Industrial-level controllable zero-shot text-to-speech system
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Reading book source
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
AI-powered tool for generating, optimizing, and translating subtitles
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Controllable & emotion-expressive zero-shot TTS
Management of Yandex Station and other smart home devices
The simplest, fastest repository for training/finetuning models
Windows GUI Automation with Python (based on text properties)
Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
Generating Immersive, Explorable, and Interactive 3D Worlds