This repository contains the official implementation of FastVLM
A Production-ready Reinforcement Learning AI Agent Library
Research code artifacts for Code World Model (CWM)
Capable of understanding text, audio, vision, video
The Clay Foundation Model - An open source AI model and interface
Sharp Monocular Metric Depth in Less Than a Second
GPT4V-level open-source multi-modal model based on Llama3-8B
Implementation of "MobileCLIP" CVPR 2024
Video understanding codebase from FAIR for reproducing video models
Chinese and English multimodal conversational language model
Chat & pretrained large vision language model
GLM-4 series: Open Multilingual Multimodal Chat LMs
Fast and Universal 3D reconstruction model for versatile tasks
A PyTorch library for implementing flow matching algorithms
PyTorch code and models for the DINOv2 self-supervised learning
Memory-efficient and performant finetuning of Mistral's models
Official implementation of DreamCraft3D
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Large Multimodal Models for Video Understanding and Editing
Revolutionizing Database Interactions with Private LLM Technology
Controllable & emotion-expressive zero-shot TTS
Tooling for the Common Objects In 3D dataset
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Mixture-of-Experts Vision-Language Models for Advanced Multimodal