PyTorch code and models for the DINOv2 self-supervised learning
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Official implementation of DreamCraft3D
A Customizable Image-to-Video Model based on HunyuanVideo
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Qwen2.5-VL is the multimodal large language model series
OCR expert VLM powered by Hunyuan's native multimodal architecture
An Efficient Agentic Model for Computer Use
Audio foundation model excelling in audio understanding
Phi-3.5 for Mac: Locally-run Vision and Language Models
Revolutionizing Database Interactions with Private LLM Technology
DeepMind model for tracking arbitrary points across videos & robotics
Sharp Monocular Metric Depth in Less Than a Second
Tooling for the Common Objects In 3D dataset
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Inference framework for 1-bit LLMs
Large-language-model & vision-language-model based on Linear Attention
Tiny vision language model
Inference code for scalable emulation of protein equilibrium ensembles
Chat & pretrained large vision language model
Fast-stable-diffusion + DreamBooth
A Pragmatic VLA Foundation Model
Multimodal embedding and reranking models built on Qwen3-VL
Collection of Gemma 3 variants that are trained for performance
Implementation of "MobileCLIP" CVPR 2024