Implementation of RQ Transformer, which proposes a more efficient way of training multi-dimensional sequences autoregressively. This repository will only contain the transformer for now. You can use this vector quantization library for the residual VQ. This type of axial autoregressive transformer should be compatible with memcodes, proposed in NWT. It would likely also work well with multi-headed VQ. I also think there is something deeper going on, and have generalized this to any number of dimensions. You can use it by importing the HierarchicalCausalTransformer. For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider long-range interactions of codes. However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off.
Features
- Transformer can efficiently reduce the computational costs
- Outperforms the existing AR models on various benchmarks of unconditional and conditional image generation
- RQ-Transformer learns to predict the quantized feature vector at the next position by predicting the next stack of codes
- Effectively generate high-resolution images
- RQ-VAE can precisely approximate a feature map of an image and represent the image as a stacked map of discrete codes
- Autoregressive Image Generation using Residual Quantization