High-resolution models for human tasks
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Inference script for Oasis 500M
Memory-efficient and performant finetuning of Mistral's models
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Unified Multimodal Understanding and Generation Models
Qwen-Image is a powerful image generation foundation model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A trainable PyTorch reproduction of AlphaFold 3
Renderer for the harmony response format to be used with gpt-oss
Pretrained time-series foundation model developed by Google Research
Fast and Universal 3D reconstruction model for versatile tasks
From Images to High-Fidelity 3D Assets
HY-Motion model for 3D character animation generation
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
An AI-powered security review GitHub Action using Claude
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Qwen3-omni is a natively end-to-end, omni-modal LLM
DeepMind model for tracking arbitrary points across videos & robotics
Official repository for LTX-Video
CLIP, Predict the most relevant text snippet given an image
A series of math-specific large language models of our Qwen2 series
Foundation model for image generation
Qwen3-ASR is an open-source series of ASR models
Qwen2.5-VL is the multimodal large language model series