Official inference repo for FLUX.1 models
CLIP, Predict the most relevant text snippet given an image
Code for running inference and finetuning with SAM 3 model
A Powerful Native Multimodal Model for Image Generation
Official inference repo for FLUX.2 models
Collection of Gemma 3 variants that are trained for performance
Accurate × Fast × Comprehensive
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Image generation model with single-stream diffusion transformer
Long-form streaming TTS system for multi-speaker dialogue generation
Diffusion Transformer with Fine-Grained Chinese Understanding
Visual Causal Flow
Large Multimodal Models for Video Understanding and Editing
Implementation of "MobileCLIP" CVPR 2024
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Block Diffusion for Ultra-Fast Speculative Decoding
The official PyTorch implementation of Google's Gemma models
ICLR2024 Spotlight: curation/training code, metadata, distribution
Official implementation of DreamCraft3D
Open source large language model by Alibaba
Official code for Style Aligned Image Generation via Shared Attention
Official PyTorch Implementation of "Scalable Diffusion Models"
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
VaultGemma: 1B DP-trained Gemma variant for private NLP tasks
Custom BLEURT model for evaluating text similarity using PyTorch