An experimental version of DeepSeek model
A series of math-specific large language models of our Qwen2 series
Generating Immersive, Explorable, and Interactive 3D Worlds
General-purpose image editing model that delivers high-fidelity
Accurate × Fast × Comprehensive
Fast and Universal 3D reconstruction model for versatile tasks
GLM-4 series: Open Multilingual Multimodal Chat LMs
Sharp Monocular Metric Depth in Less Than a Second
code for Mesh R-CNN, ICCV 2019
Language modeling in a sentence representation space
Renderer for the harmony response format to be used with gpt-oss
LLM-based Reinforcement Learning audio edit model
Audio Language Models are Few-Shot Learners
Open-source industrial-grade ASR models
Foundation model for image generation
Fast-stable-diffusion + DreamBooth
Hunyuan Translation Model Version 1.5
Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal embedding and reranking models built on Qwen3-VL
Implementation of "MobileCLIP" CVPR 2024
VMZ: Model Zoo for Video Modeling
Official implementation of Watermark Anything with Localized Messages
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI