Motion-controllable Video Generation via Latent Trajectory Guidance
Persistent context and multi-instance coordination
Multimodal embedding and reranking models built on Qwen3-VL
DeepMind model for tracking arbitrary points across videos & robotics
Implementation of "MobileCLIP" CVPR 2024
VMZ: Model Zoo for Video Modeling
Official implementation of Watermark Anything with Localized Messages
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Video understanding codebase from FAIR for reproducing video models
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Renderer for the harmony response format to be used with gpt-oss
The ChatGPT Retrieval Plugin lets you easily find personal documents
CLIP, Predict the most relevant text snippet given an image
User toolkit for analyzing and interfacing with Large Language Models
Leading open-source visualization and observability platform
Extract schema, statistics and entities from datasets
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A collaboration friendly studio for NeRFs
A mcp server for vikingdb store and search
Machine Learning automation and tracking
Lemonade helps users run local LLMs with the highest performance
Fast image augmentation library and an easy-to-use wrapper
A library for deep learning end-to-end dialog systems and chatbots
Kubernetes observability and automation