4M: Massively Multimodal Masked Modeling
Global weather forecasting model using graph neural networks and JAX
Proxy that exposes Antigravity provided claude / gemini models
Inference script for Oasis 500M
Claude Code image, a one-stop open source transit service
Designed for text embedding and ranking tasks
High-Fidelity and Controllable Generation of Textured 3D Assets
Large Multimodal Models for Video Understanding and Editing
Repo of Qwen2-Audio chat & pretrained large audio language model
A Powerful Native Multimodal Model for Image Generation
State of the art LLM and coding model
Implementation of the Surya Foundation Model for Heliophysics
OCR expert VLM powered by Hunyuan's native multimodal architecture
Repo for SeedVR2 & SeedVR
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Collection of Gemma 3 variants that are trained for performance
The official PyTorch implementation of Google's Gemma models
A 0.1B Omni model trained from scratch
Block Diffusion for Ultra-Fast Speculative Decoding
Instructions on how to use the Realtime API on Microcontrollers
Long-form streaming TTS system for multi-speaker dialogue generation
A SOTA open-source image editing model
Pretrained time-series foundation model developed by Google Research
New set of lightweight state-of-the-art, open foundation models
Official implementation of DreamCraft3D