Open-source industrial-grade ASR models
High-resolution models for human tasks
Ling is a MoE LLM provided and open-sourced by InclusionAI
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
The Clay Foundation Model - An open source AI model and interface
Large-language-model & vision-language-model based on Linear Attention
Code for running inference with the SAM 3D Body Model 3DB
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Sharp Monocular Metric Depth in Less Than a Second
Tooling for the Common Objects In 3D dataset
Generating Immersive, Explorable, and Interactive 3D Worlds
HY-Motion model for 3D character animation generation
ICLR2024 Spotlight: curation/training code, metadata, distribution
Tool for exploring and debugging transformer model behaviors
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
A Multi-Modal World Model for Reconstructing, Generating, Simulation
code for Mesh R-CNN, ICCV 2019
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Renderer for the harmony response format to be used with gpt-oss
High-Resolution Image Synthesis with Latent Diffusion Models
Netease Youdao's open-source embedding and reranker models
Audio foundation model excelling in audio understanding
FAIR Sequence Modeling Toolkit 2