Free, high-quality text-to-speech API endpoint to replace OpenAI
A Powerful Native Multimodal Model for Image Generation
Industrial-level controllable zero-shot text-to-speech system
Deep Research framework, combining language models with tools
Spark-TTS Inference Code
A Model Context Protocol (MCP) server
Framework for building real-time voice and multimodal AI agents
Easily compute clip embeddings and build a clip retrieval system
Faster Whisper transcription with CTranslate2
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
TextWorld is a sandbox learning environment for the training
Agent harness to make your slop code well-engineered and beautiful
A Unified Framework for Text-to-3D and Image-to-3D Generation
Official Python inference and LoRA trainer package
Open source healthcare AI
A full spaCy pipeline and models for scientific/biomedical documents
A Repo For Document AI
Controllable & emotion-expressive zero-shot TTS
A fast TTS architecture with conditional flow matching
A community-supported supercharged version of paperless
Accurate × Fast × Comprehensive
Underthesea - Vietnamese NLP Toolkit
Generate blog articles from video or audio