Accurate × Fast × Comprehensive
Fast and Universal 3D reconstruction model for versatile tasks
LLM-based Reinforcement Learning audio edit model
Sharp Monocular Metric Depth in Less Than a Second
code for Mesh R-CNN, ICCV 2019
Language modeling in a sentence representation space
Renderer for the harmony response format to be used with gpt-oss
GLM-4 series: Open Multilingual Multimodal Chat LMs
Audio Language Models are Few-Shot Learners
Open-source industrial-grade ASR models
Foundation model for image generation
Fast-stable-diffusion + DreamBooth
Hunyuan Translation Model Version 1.5
Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal embedding and reranking models built on Qwen3-VL
Implementation of "MobileCLIP" CVPR 2024
VMZ: Model Zoo for Video Modeling
Official implementation of Watermark Anything with Localized Messages
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
High-Fidelity and Controllable Generation of Textured 3D Assets
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion