Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal embedding and reranking models built on Qwen3-VL
Z80-μLM is a 2-bit quantized language model
Collection of Gemma 3 variants that are trained for performance
Implementation of "MobileCLIP" CVPR 2024
VMZ: Model Zoo for Video Modeling
Official implementation of Watermark Anything with Localized Messages
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Multimodal Diffusion with Representation Alignment
Personalize Any Characters with a Scalable Diffusion Transformer
MOSS‑TTS Family open‑source speech and sound generation model
Bidirectional token-classification model for identifiable info
Genome modeling and design across all domains of life
Achieving 3+ generation speedup on reasoning tasks
Ultra-Efficient LLMs on End Device
Pretrained time-series foundation model developed by Google Research
Long-form streaming TTS system for multi-speaker dialogue generation
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Open-source deep-learning framework
Generate Any 3D Scene in Seconds