MOSS‑TTS Family open‑source speech and sound generation model
Inference script for Oasis 500M
High-Fidelity and Controllable Generation of Textured 3D Assets
Large Multimodal Models for Video Understanding and Editing
Open-source image generative foundation model
Global weather forecasting model using graph neural networks and JAX
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Repo of Qwen2-Audio chat & pretrained large audio language model
State of the art LLM and coding model
OCR expert VLM powered by Hunyuan's native multimodal architecture
Repo for SeedVR2 & SeedVR
A Powerful Native Multimodal Model for Image Generation
Collection of Gemma 3 variants that are trained for performance
The official PyTorch implementation of Google's Gemma models
A 0.1B Omni model trained from scratch
Block Diffusion for Ultra-Fast Speculative Decoding
Instructions on how to use the Realtime API on Microcontrollers
Long-form streaming TTS system for multi-speaker dialogue generation
A SOTA open-source image editing model
Implementation of the Surya Foundation Model for Heliophysics
Pretrained time-series foundation model developed by Google Research
New set of lightweight state-of-the-art, open foundation models
Official implementation of DreamCraft3D
tiktoken is a fast BPE tokeniser for use with OpenAI's models
code for Mesh R-CNN, ICCV 2019