Inference code for scalable emulation of protein equilibrium ensembles
DeepMind model for tracking arbitrary points across videos & robotics
Sharp Monocular Metric Depth in Less Than a Second
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Inference framework for 1-bit LLMs
Ling is a MoE LLM provided and open-sourced by InclusionAI
High-Fidelity and Controllable Generation of Textured 3D Assets
Large Multimodal Models for Video Understanding and Editing
Fast-stable-diffusion + DreamBooth
Multimodal embedding and reranking models built on Qwen3-VL
VMZ: Model Zoo for Video Modeling
Official implementation of Watermark Anything with Localized Messages
High-resolution models for human tasks
Towards Real-World Vision-Language Understanding
Large-language-model & vision-language-model based on Linear Attention
Chat & pretrained large vision language model
OCR expert VLM powered by Hunyuan's native multimodal architecture
Qwen3-omni is a natively end-to-end, omni-modal LLM
Pokee Deep Research Model Open Source Repo
Implementation of the Surya Foundation Model for Heliophysics
A SOTA open-source image editing model
Audio foundation model excelling in audio understanding
Fast and Universal 3D reconstruction model for versatile tasks
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM