A high-quality rapid TTS voice cloning model
Framework for building realtime multimodal voice AI agents apps
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Open source no-code system for text annotation and building of text
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Agent harness to make your slop code well-engineered and beautiful
A Family of Open Sourced Music Foundation Models
Faster Whisper transcription with CTranslate2
The official Python SDK for the ElevenLabs API
Industrial-level controllable zero-shot text-to-speech system
State-of-the-art TTS model under 25MB
AI video generator optimized for low VRAM and older GPUs use
A lightweight text-to-speech model with zero-shot voice cloning
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Converts text to speech in realtime
Qwen3-omni is a natively end-to-end, omni-modal LLM
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
AI-powered tool for generating, optimizing, and translating subtitles
Generating Immersive, Explorable, and Interactive 3D Worlds
State-of-the-art (SoTA) text-to-video pre-trained model
Unifying 3D Mesh Generation with Language Models
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
A fast TTS architecture with conditional flow matching