The most powerful local music generation model
Code for running inference and finetuning with SAM 3 model
Advanced language and coding AI model
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
Towards Real-World Vision-Language Understanding
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Official inference repo for FLUX.2 models
Unified Multimodal Understanding and Generation Models
Repo for SeedVR2 & SeedVR
Qwen3-omni is a natively end-to-end, omni-modal LLM
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Video understanding codebase from FAIR for reproducing video models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Renderer for the harmony response format to be used with gpt-oss
Large-language-model & vision-language-model based on Linear Attention
Foundation model for image generation
Real-time behaviour synthesis with MuJoCo, using Predictive Control
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
A Production-ready Reinforcement Learning AI Agent Library
A PyTorch library for implementing flow matching algorithms
Language modeling in a sentence representation space
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A trainable PyTorch reproduction of AlphaFold 3
Open-source framework for intelligent speech interaction