Open Source Speech Language Model
CLIP, Predict the most relevant text snippet given an image
Audio foundation model excelling in audio understanding
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Multimodal embedding and reranking models built on Qwen3-VL
Video understanding codebase from FAIR for reproducing video models
A SOTA open-source image editing model
A Production-ready Reinforcement Learning AI Agent Library