VMZ: Model Zoo for Video Modeling
CLIP, Predict the most relevant text snippet given an image
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Renderer for the harmony response format to be used with gpt-oss
Inference framework for 1-bit LLMs
AlphaFold 3 inference pipeline
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Open-source large language model family from Tencent Hunyuan
The Clay Foundation Model - An open source AI model and interface
Sharp Monocular Metric Depth in Less Than a Second
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Designed for text embedding and ranking tasks
A series of math-specific large language models of our Qwen2 series
Lets make video diffusion practical
Tool for exploring and debugging transformer model behaviors
A state-of-the-art open visual language model
Repo of Qwen2-Audio chat & pretrained large audio language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
This repository contains the official implementation of FastVLM
PyTorch code and models for the DINOv2 self-supervised learning
GLM-4-Voice | End-to-End Chinese-English Conversational Model