High-resolution models for human tasks
Sharp Monocular Metric Depth in Less Than a Second
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Netease Youdao's open-source embedding and reranker models
An Efficient Agentic Model for Computer Use
Audio foundation model excelling in audio understanding
Large-language-model & vision-language-model based on Linear Attention
Large Multimodal Models for Video Understanding and Editing
OCR expert VLM powered by Hunyuan's native multimodal architecture
Video Object and Interaction Deletion
Recovering the Visual Space from Any Views
Foundation model for image generation
Z80-μLM is a 2-bit quantized language model
Implementation of "MobileCLIP" CVPR 2024
Tool for exploring and debugging transformer model behaviors
Capable of understanding text, audio, vision, video
Uncommon Objects in 3D dataset
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Generating Immersive, Explorable, and Interactive 3D Worlds
A trainable PyTorch reproduction of AlphaFold 3
Achieving 3+ generation speedup on reasoning tasks
Ultra-Efficient LLMs on End Device
HY-Motion model for 3D character animation generation
Generate Any 3D Scene in Seconds
PyTorch code and models for the DINOv2 self-supervised learning