code for Mesh R-CNN, ICCV 2019
Renderer for the harmony response format to be used with gpt-oss
AlphaFold 3 inference pipeline
Programmatic access to the AlphaGenome model
Fast and Universal 3D reconstruction model for versatile tasks
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
The Clay Foundation Model - An open source AI model and interface
Pokee Deep Research Model Open Source Repo
GPT4V-level open-source multi-modal model based on Llama3-8B
A state-of-the-art open visual language model
Chinese and English multimodal conversational language model
GLM-4 series: Open Multilingual Multimodal Chat LMs
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Diffusion Transformer with Fine-Grained Chinese Understanding
Phi-3.5 for Mac: Locally-run Vision and Language Models
Revolutionizing Database Interactions with Private LLM Technology
Qwen2.5-VL is the multimodal large language model series
DeepMind model for tracking arbitrary points across videos & robotics
Sharp Monocular Metric Depth in Less Than a Second
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Inference framework for 1-bit LLMs
The official PyTorch implementation of Google's Gemma models