Foundation model for image generation
Diffusion Bee is the easiest way to run Stable Diffusion locally
Designed for text embedding and ranking tasks
Multimodal embedding and reranking models built on Qwen3-VL
The most powerful local music generation model
Controllable & emotion-expressive zero-shot TTS
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Open-source multi-speaker long-form text-to-speech model
Qwen2.5-VL is the multimodal large language model series
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Fast stable diffusion on CPU and AI PC
High-Resolution Image Synthesis with Latent Diffusion Models
Accurate × Fast × Comprehensive
Capable of understanding text, audio, vision, video
A Systematic Framework for Interactive World Modeling
Long-form streaming TTS system for multi-speaker dialogue generation
Visual Causal Flow
Bidirectional token-classification model for identifiable info
HY-Motion model for 3D character animation generation
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Large-language-model & vision-language-model based on Linear Attention
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
General-purpose image editing model that delivers high-fidelity
Diffusion Transformer with Fine-Grained Chinese Understanding
Generate Any 3D Scene in Seconds