stable-diffusion-xl-base-1.0 is a next-generation latent diffusion model developed by Stability AI for producing highly detailed images from text prompts. It forms the core of the SDXL pipeline and can be used on its own or paired with a refinement model for enhanced results. This base model utilizes two pretrained text encoders—OpenCLIP-ViT/G and CLIP-ViT/L—for richer text understanding and improved image quality. The model supports two-stage generation, where the base model creates initial latents and the refiner further denoises them using techniques like SDEdit for sharper outputs. SDXL-base shows significant performance improvement over previous versions such as Stable Diffusion 1.5 and 2.1, especially when paired with the refiner. It is compatible with PyTorch, ONNX, and OpenVINO runtimes, offering flexibility for various hardware setups. Although it delivers high visual fidelity, it still faces challenges with complex composition, photorealism, and rendering legible text.
Features
- Generates images from text using dual pretrained encoders
- Can be used standalone or with SDXL Refiner for enhanced results
- Compatible with Diffusers, ONNX, and OpenVINO runtimes
- Supports CPU offloading and memory-efficient attention
- Enables two-stage generation using SDEdit (img2img)
- Built for high-resolution, detailed image synthesis
- Optimized for use with torch.compile (PyTorch ≥2.0)
- Licensed under CreativeML OpenRAIL++ for responsible use