The 1D Visual Tokenization and Generation project from ByteDance introduces a novel “one-dimensional” tokenizer designed for images: instead of representing images with large grids of 2D tokens (as in many prior generative/image-modeling systems), it compresses images into as few as 32 discrete tokens (or more, optionally) — thereby achieving a very compact, efficient representation that drastically speeds up generation and reconstruction while retaining strong fidelity. This compact representation makes sampling and generation many times faster compared to previous tokenization methods (claims of ~410× speedups relative to heavyweight models) while still producing competitive image quality. The repo also bundles a full generative modeling pipeline (e.g. the framework “MaskGen” / “TA-TiTok”) that demonstrates how this 1D tokenizer can be used in text-to-image generation or image reconstruction tasks.
Features
- Compact image representation using as little as 32 discrete tokens
- Significant speed-up in generation and sampling (much faster than conventional 2D-token systems)
- Compatibility with autoregressive or transformer-based generation frameworks treating tokens like language tokens
- Bundled generative pipelines (e.g. MaskGen / TA-TiTok) for text-to-image and image reconstruction tasks
- Configurable tokenization granularity (32 tokens, 128 tokens, etc.) to trade off between speed and quality
- Open-source license enabling adaptation, fine-tuning, or integration into custom generative workflows