GLIDE: a diffusion-based text-conditional image synthesis model
An implementation of model parallel GPT-2 and GPT-3-style models
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Dia-1.6B generates lifelike English dialogue and vocal expressions
CTC-based forced aligner for audio-text in 158 languages
Vision-language-action model for robot control via images and text