Sharp Monocular Metric Depth in Less Than a Second
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
The repository provides code for running inference with SAM 2
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
An open sourced end-to-end VLM-based GUI Agent
Inference framework for 1-bit LLMs
A dev-first open source autonomous AI agent framework
Full stack AI software engineer
The standard data-centric AI package for data quality and ML
Leveraging BERT and c-TF-IDF to create easily interpretable topics
Implementation of Video Diffusion Models
An MLOps framework to package, deploy, monitor and manage models
A simple forecasting package
AI Toolkit for Healthcare Imaging
Create HTML profiling reports from pandas DataFrame objects
Train machine learning models within Docker containers
Build AI-powered semantic search applications
The official PyTorch implementation of Google's Gemma models
Photorealistic Synthetic Dataset for Holistic Indoor Scene
Machine Learning Systems: Design and Implementation
Capable of understanding text, audio, vision, video
MemU is an open-source memory framework for AI companions
Inference code for scalable emulation of protein equilibrium ensembles