Lets make video diffusion practical
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Language modeling in a sentence representation space
Python example app from the OpenAI API quickstart tutorial
A Conversational Speech Generation Model
Official DeiT repository
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Learning to Act by Watching Unlabeled Online Videos
PyTorch implementation of MAE
Learning Continuous Signed Distance Functions for Shape Representation
Code for reproducing key results in the paper