Large Multimodal Models for Video Understanding and Editing
Qwen3-omni is a natively end-to-end, omni-modal LLM
Phi-3.5 for Mac: Locally-run Vision and Language Models
Revolutionizing Database Interactions with Private LLM Technology
Programmatic access to the AlphaGenome model
Implementation of "MobileCLIP" CVPR 2024
Video understanding codebase from FAIR for reproducing video models
Multimodal Diffusion with Representation Alignment
Fast and Universal 3D reconstruction model for versatile tasks
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Memory-efficient and performant finetuning of Mistral's models
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Unified Multimodal Understanding and Generation Models
Tooling for the Common Objects In 3D dataset
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
A series of math-specific large language models of our Qwen2 series
A state-of-the-art open visual language model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Inference code for scalable emulation of protein equilibrium ensembles
The Clay Foundation Model - An open source AI model and interface
The official PyTorch implementation of Google's Gemma models
VMZ: Model Zoo for Video Modeling