Wan2.1: Open and Advanced Large-Scale Video Generative Model
A robust, efficient, low-latency speech-to-text library
Robust Speech Recognition via Large-Scale Weak Supervision
Contexts Optical Compression
High-Quality Voice Cloning TTS for 600+ Languages
A generative speech model for daily dialogue
Official inference repo for FLUX.2 models
A simple, high-quality voice conversion tool focused on ease of use
Python tool for converting files and office documents to Markdown
Offline Text To Speech synthesis for python
Use Microsoft Edge's online text-to-speech service from Python
Automatic Speech Recognition with Word-level Timestamps
Generate audiobooks from e-books
Generate audiobooks from e-books, voice cloning & 1107+ languages
Official MiniMax Model Context Protocol (MCP) server
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
A Powerful Native Multimodal Model for Image Generation
Text and image to video generation: CogVideoX and CogVideo
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Video-based AI memory library. Store millions of text chunks in MP4
Label Studio is a multi-type data labeling and annotation tool
CLIP, Predict the most relevant text snippet given an image
An open source implementation of CLIP
MTEB: Massive Text Embedding Benchmark
Offline inference engine for art, real-time voice conversations