Open Source Speech Language Model
Open-source industrial-grade ASR models
Qwen3-ASR is an open-source series of ASR models
Foundation model for image generation
A Pragmatic VLA Foundation Model
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Block Diffusion for Ultra-Fast Speculative Decoding
Multimodal embedding and reranking models built on Qwen3-VL
Collection of Gemma 3 variants that are trained for performance
VMZ: Model Zoo for Video Modeling
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
Personalize Any Characters with a Scalable Diffusion Transformer
Genome modeling and design across all domains of life
Project Lyra: Open Generative 3D World Models
Pretrained time-series foundation model developed by Google Research
Long-form streaming TTS system for multi-speaker dialogue generation
General-purpose image editing model that delivers high-fidelity
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Open-Source Financial Large Language Models
Fast and Universal 3D reconstruction model for versatile tasks
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM