Robust Speech Recognition Across Languages, Dialects
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Tiny vision language model
Large-language-model & vision-language-model based on Linear Attention
Code for running inference with the SAM 3D Body Model 3DB
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Sharp Monocular Metric Depth in Less Than a Second
Tooling for the Common Objects In 3D dataset
Generating Immersive, Explorable, and Interactive 3D Worlds
Open Source Speech Language Model
Open-source industrial-grade ASR models
High-resolution models for human tasks
Ling is a MoE LLM provided and open-sourced by InclusionAI
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
HY-Motion model for 3D character animation generation
The Clay Foundation Model - An open source AI model and interface
Access to Anthropic's safety-first language model APIs
Tool for exploring and debugging transformer model behaviors
A Multi-Modal World Model for Reconstructing, Generating, Simulation
New set of lightweight state-of-the-art, open foundation models
A Family of Open Foundation Models for Code Intelligence
code for Mesh R-CNN, ICCV 2019
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models