Awesome multilingual OCR toolkits based on PaddlePaddle
An easy 1-click way to create beautiful artwork on your PC using AI
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Official inference repo for FLUX.1 models
Multimodal-Driven Architecture for Customized Video Generation
Generating Immersive, Explorable, and Interactive 3D Worlds
State-of-the-art (SoTA) text-to-video pre-trained model
A Family of Open Sourced Music Foundation Models
super expressive prompting model based on ltx2.3
A multimodal model for brain response prediction
State-of-the-art TTS model under 25MB
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
Industrial-level controllable zero-shot text-to-speech system
Qwen3-omni is a natively end-to-end, omni-modal LLM
Controllable & emotion-expressive zero-shot TTS
Collection of Gemma 3 variants that are trained for performance
Official Python inference and LoRA trainer package
Large-language-model & vision-language-model based on Linear Attention
Moonshot's most powerful AI model
Open-source multi-speaker long-form text-to-speech model
The most powerful local music generation model
A Multi-Modal World Model for Reconstructing, Generating, Simulation
tiktoken is a fast BPE tokeniser for use with OpenAI's models
General-purpose image editing model that delivers high-fidelity