Fast stable diffusion on CPU and AI PC
Pushing the Limits of Mathematical Reasoning in Open Language Models
Open-source multi-speaker long-form text-to-speech model
AlphaFold 3 inference pipeline
A Multi-Modal World Model for Reconstructing, Generating, Simulation
General-purpose image editing model that delivers high-fidelity
ICLR2024 Spotlight: curation/training code, metadata, distribution
From Images to High-Fidelity 3D Assets
Qwen3-Coder is the code version of Qwen3
Reference PyTorch implementation and models for DINOv3
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Advancing Open-source World Models
A SOTA open-source image editing model
A Family of Open Sourced Music Foundation Models
High-Resolution Image Synthesis with Latent Diffusion Models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
OCR expert VLM powered by Hunyuan's native multimodal architecture
Foundation model for image generation
Lets make video diffusion practical
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
A Powerful Native Multimodal Model for Image Generation
Qwen2.5-VL is the multimodal large language model series
A Systematic Framework for Interactive World Modeling
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Renderer for the harmony response format to be used with gpt-oss