Qwen3-ASR is an open-source series of ASR models
VMZ: Model Zoo for Video Modeling
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Tiny vision language model
The official PyTorch implementation of Google's Gemma models
OCR expert VLM powered by Hunyuan's native multimodal architecture
Qwen3-omni is a natively end-to-end, omni-modal LLM
Code for running inference with the SAM 3D Body Model 3DB
Sharp Monocular Metric Depth in Less Than a Second
Provides convenient access to the Anthropic REST API from any Python 3
DeepSeek Coder: Let the Code Write Itself
Diversity-driven optimization and large-model reasoning ability
A state-of-the-art open visual language model
Chinese and English multimodal conversational language model
Repo of Qwen2-Audio chat & pretrained large audio language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Capable of understanding text, audio, vision, video
Inference script for Oasis 500M
HY-Motion model for 3D character animation generation
Foundation Models for Time Series
A Production-ready Reinforcement Learning AI Agent Library
Hackable and optimized Transformers building blocks