Qwen-Image is a powerful image generation foundation model
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
Foundation model for image generation
Image generation model with single-stream diffusion transformer
General-purpose image editing model that delivers high-fidelity
Official inference repo for FLUX.2 models
Multimodal-Driven Architecture for Customized Video Generation
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Qwen3-omni is a natively end-to-end, omni-modal LLM
A Powerful Native Multimodal Model for Image Generation
An easy 1-click way to create beautiful artwork on your PC using AI
Official inference repo for FLUX.1 models
CLIP, Predict the most relevant text snippet given an image
Capable of understanding text, audio, vision, video
A Unified Framework for Text-to-3D and Image-to-3D Generation
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference
Generating Immersive, Explorable, and Interactive 3D Worlds
Text and image to video generation: CogVideoX and CogVideo
Collection of Gemma 3 variants that are trained for performance
Contexts Optical Compression