Audio foundation model excelling in audio understanding
Tiny vision language model
Open Source Speech Language Model
Foundation model for image generation
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal embedding and reranking models built on Qwen3-VL
Z80-μLM is a 2-bit quantized language model
Implementation of "MobileCLIP" CVPR 2024
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Multimodal-Driven Architecture for Customized Video Generation
Personalize Any Characters with a Scalable Diffusion Transformer
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Project Lyra: Open Generative 3D World Models
Achieving 3+ generation speedup on reasoning tasks
Ultra-Efficient LLMs on End Device
General-purpose image editing model that delivers high-fidelity
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Inference script for Oasis 500M
Generate Any 3D Scene in Seconds
Fast and Universal 3D reconstruction model for versatile tasks
4M: Massively Multimodal Masked Modeling
FAIR Sequence Modeling Toolkit 2