Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
High-Resolution Image Synthesis with Latent Diffusion Models
State-of-the-art (SoTA) text-to-video pre-trained model
AlphaFold 3 inference pipeline
GLM-4 series: Open Multilingual Multimodal Chat LMs
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Controllable & emotion-expressive zero-shot TTS
Sharp Monocular Metric Depth in Less Than a Second
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Inference framework for 1-bit LLMs
Qwen3-omni is a natively end-to-end, omni-modal LLM
Repo for SeedVR2 & SeedVR
A Unified Framework for Text-to-3D and Image-to-3D Generation
Hackable and optimized Transformers building blocks
PyTorch code and models for the DINOv2 self-supervised learning
Official implementation of DreamCraft3D
Pokee Deep Research Model Open Source Repo
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Provides convenient access to the Anthropic REST API from any Python 3
Capable of understanding text, audio, vision, video
Tiny vision language model
Tool for exploring and debugging transformer model behaviors
Multimodal Diffusion with Representation Alignment