Audio foundation model excelling in audio understanding
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Production-tested AI infrastructure tools
Open Source Speech Language Model
Multimodal embedding and reranking models built on Qwen3-VL
A Family of Open Foundation Models for Code Intelligence
A Production-ready Reinforcement Learning AI Agent Library
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
A SOTA open-source image editing model
A repository of trained models
Versatile 8B-base multimodal LLM, flexible foundation for custom AI