Generating Immersive, Explorable, and Interactive 3D Worlds
Tokenizer-Free TTS for Multilingual Speech Generation
A robust, efficient, low-latency speech-to-text library
Handwritten Text Recognition (HTR) system implemented with TensorFlow
A Family of Open Sourced Music Foundation Models
Lightning-fast, on-device TTS, running natively via ONNX
JavaScript OCR and text extraction for images and PDFs
State-of-the-art (SoTA) text-to-video pre-trained model
Cross-platform AI language practice app
Official MiniMax Model Context Protocol (MCP) server
Generate audiobooks from e-books
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
Unified web UI for training and running open models locally
A persistent, network resilient, full text search library
super expressive prompting model based on ltx2.3
A multimodal model for brain response prediction
TextWorld is a sandbox learning environment for the training
Faster Whisper transcription with CTranslate2
Open source healthcare AI
Chat with it via text and voice
Video-based AI memory library. Store millions of text chunks in MP4
Industrial-level controllable zero-shot text-to-speech system
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
State-of-the-art TTS model under 25MB
Modest natural-language processing