Powerful AI language model (MoE) optimized for efficiency/performance
Code for running inference with the SAM 3D Body Model 3DB
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Models for object and human mesh reconstruction
VMZ: Model Zoo for Video Modeling
Official implementation of Watermark Anything with Localized Messages
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Capable of understanding text, audio, vision, video
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
LLM-based Reinforcement Learning audio edit model
Code release for "Masked-attention Mask Transformer