Z-Image is an efficient, open-source image generation foundation model built to make high-quality image synthesis more accessible. With just 6 billion parameters — far fewer than many large-scale models — it uses a novel “single-stream diffusion Transformer” architecture to deliver photorealistic image generation, demonstrating that excellence does not always require extremely large model sizes. The project includes several variants: Z-Image-Turbo, a distilled version optimized for speed and low resource consumption; Z-Image-Base, the full-capacity foundation model; and Z-Image-Edit, fine-tuned for image editing tasks. Despite its compact size, Z-Image produces outputs that closely rival those from much larger models — including strong rendering of bilingual (English and Chinese) text inside images, accurate prompt adherence, and good layout and composition.
Features
- 6 billion-parameter image generation foundation model using a single-stream diffusion Transformer architecture
- Multiple variants — Z-Image-Turbo (distilled & fast), Z-Image-Base (full model), and Z-Image-Edit (image editing)
- Photorealistic image generation with strong prompt fidelity and realistic lighting, composition, and textures
- Accurate bilingual text rendering (English + Chinese) and robust support for complex prompts
- Efficient inference enabling reasonable performance on ~16 GB VRAM GPUs, suitable for consumer hardware
- Open-source distribution of code, model weights, and pipelines — enabling community fine-tuning, use in projects, or custom development