Official implementation of Watermark Anything with Localized Messages
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Global weather forecasting model using graph neural networks and JAX
Tooling for the Common Objects In 3D dataset
code for Mesh R-CNN, ICCV 2019
Renderer for the harmony response format to be used with gpt-oss
AlphaFold 3 inference pipeline
Programmatic access to the AlphaGenome model
Fast and Universal 3D reconstruction model for versatile tasks
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
The Clay Foundation Model - An open source AI model and interface
Pokee Deep Research Model Open Source Repo
GPT4V-level open-source multi-modal model based on Llama3-8B
A state-of-the-art open visual language model
Chinese and English multimodal conversational language model
GLM-4 series: Open Multilingual Multimodal Chat LMs
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Diffusion Transformer with Fine-Grained Chinese Understanding
Qwen2.5-VL is the multimodal large language model series
Phi-3.5 for Mac: Locally-run Vision and Language Models
Revolutionizing Database Interactions with Private LLM Technology
DeepMind model for tracking arbitrary points across videos & robotics
Sharp Monocular Metric Depth in Less Than a Second
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models