HY-Motion model for 3D character animation generation
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Open-source large language model family from Tencent Hunyuan
A Multi-Modal World Model for Reconstructing, Generating, Simulation
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
GLM-4-Voice | End-to-End Chinese-English Conversational Model
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
A series of math-specific large language models of our Qwen2 series
Chinese and English multimodal conversational language model
Large-language-model & vision-language-model based on Linear Attention
Qwen-Image is a powerful image generation foundation model
An Efficient Agentic Model for Computer Use
Phi-3.5 for Mac: Locally-run Vision and Language Models
Robust Speech Recognition Across Languages, Dialects
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Foundation model for image generation
Tool for exploring and debugging transformer model behaviors
Project Lyra: Open Generative 3D World Models
Pretrained time-series foundation model developed by Google Research
Open-source deep-learning framework
Hackable and optimized Transformers building blocks
PyTorch code and models for the DINOv2 self-supervised learning
A Customizable Image-to-Video Model based on HunyuanVideo