Training data (data labeling, annotation, workflow) for all data types
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Chat & pretrained large vision language model
Reference PyTorch implementation and models for DINOv3
Towards Real-World Vision-Language Understanding
The largest collection of PyTorch image encoders / backbones
Large-language-model & vision-language-model based on Linear Attention
This repository contains the official implementation of FastVLM
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
The open-source tool for building high-quality datasets
A neural network that transforms a design mock-up into static websites
Automate browser-based workflows with LLMs and Computer Vision
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
We write your reusable computer vision tools
Deep learning library
A state-of-the-art open visual language model
Sample code and notebooks for Generative AI on Google Cloud
[CVPR 2025 Best Paper Award] VGGT
PyTorch code and models for the DINOv2 self-supervised learning
Qwen2.5-VL is the multimodal large language model series
High-resolution models for human tasks
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Chinese and English multimodal conversational language model
Unified Multimodal Understanding and Generation Models
UI-TARS-desktop version that can operate on your local personal device