Robust Speech Recognition via Large-Scale Weak Supervision
Generate audiobooks from EPUBs, PDFs and text with captions
Speech recognition module for Python
Offline inference engine for art, real-time voice conversations
Official MiniMax Model Context Protocol (MCP) server
Use Microsoft Edge's online text-to-speech service from Python
Official inference repo for FLUX.2 models
Wan2.2: Open and Advanced Large-Scale Video Generative Model
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
EPUB to audiobook converter, optimized for Audiobookshelf
Ready-to-use OCR with 80+ supported languages
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
State-of-the-art TTS model under 25MB
CLIP, Predict the most relevant text snippet given an image
A generative speech model for daily dialogue
A simple native web interface that uses ChatTTS to synthesize text
Audiocraft is a library for audio processing and generation
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Multi-Voice and Prompt-Controlled TTS Engine
Web interface for generating images using Stable Diffusion models
Wan2.1: Open and Advanced Large-Scale Video Generative Model
The python library for real-time communication
TTS with kokoro and onnx runtime
Generating Immersive, Explorable, and Interactive 3D Worlds
The most accurate natural language detection library for Python