Qwen-Image is a powerful image generation foundation model
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
Image generation model with single-stream diffusion transformer
Foundation model for image generation
General-purpose image editing model that delivers high-fidelity
Official inference repo for FLUX.2 models
Multimodal-Driven Architecture for Customized Video Generation
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Qwen3-omni is a natively end-to-end, omni-modal LLM
Diffusion Bee is the easiest way to run Stable Diffusion locally
A Powerful Native Multimodal Model for Image Generation
An easy 1-click way to create beautiful artwork on your PC using AI
Official inference repo for FLUX.1 models
CLIP, Predict the most relevant text snippet given an image
Capable of understanding text, audio, vision, video
A Multi-Modal World Model for Reconstructing, Generating, Simulation
A Unified Framework for Text-to-3D and Image-to-3D Generation
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference
Generating Immersive, Explorable, and Interactive 3D Worlds
Text and image to video generation: CogVideoX and CogVideo
Collection of Gemma 3 variants that are trained for performance