CLIP, Predict the most relevant text snippet given an image
MiniMax-M2, a model built for Max coding & agentic workflows
4M: Massively Multimodal Masked Modeling
code for Mesh R-CNN, ICCV 2019
Claude Code image, a one-stop open source transit service
OCR expert VLM powered by Hunyuan's native multimodal architecture
Recovering the Visual Space from Any Views
The official PyTorch implementation of Google's Gemma models
New set of lightweight state-of-the-art, open foundation models
Repo of Qwen2-Audio chat & pretrained large audio language model
Inference script for Oasis 500M
Official implementation of DreamCraft3D
Repo for SeedVR2 & SeedVR
A Powerful Native Multimodal Model for Image Generation
Global weather forecasting model using graph neural networks and JAX
Collection of Gemma 3 variants that are trained for performance
A 0.1B Omni model trained from scratch
Block Diffusion for Ultra-Fast Speculative Decoding
Instructions on how to use the Realtime API on Microcontrollers
Long-form streaming TTS system for multi-speaker dialogue generation
A SOTA open-source image editing model
Implementation of the Surya Foundation Model for Heliophysics
Pretrained time-series foundation model developed by Google Research
Production-tested AI infrastructure tools
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1