Industrial-level controllable zero-shot text-to-speech system
CLIP, Predict the most relevant text snippet given an image
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Let us control diffusion models
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Reference implementation of the Transformer architecture optimized
Reproduces results of "Fixing the train-test resolution discrepancy"
Learning Continuous Signed Distance Functions for Shape Representation