Powerful AI language model (MoE) optimized for efficiency/performance
Code for running inference with the SAM 3D Body Model 3DB
Image generation model with single-stream diffusion transformer
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Models for object and human mesh reconstruction
VMZ: Model Zoo for Video Modeling
Official implementation of Watermark Anything with Localized Messages
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Capable of understanding text, audio, vision, video
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
LLM-based Reinforcement Learning audio edit model
Code release for "Masked-attention Mask Transformer
Robust BERT-based model for English with improved MLM training
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Vision-language-action model for robot control via images and text
High-precision 14B multimodal model built for advanced reasoning tasks
Efficient 14B multimodal instruct model with edge deployment and FP8