Stable Diffusion 3.5 Large is a multimodal diffusion transformer (MMDiT) developed by Stability AI, designed for generating high-quality images from text prompts. It integrates three pretrained text encoders—OpenCLIP-ViT/G, CLIP-ViT/L, and T5-XXL—with QK-normalization for improved training stability and prompt understanding. This model excels in handling typography, detailed scenes, and creative compositions while maintaining resource efficiency. It supports inference via ComfyUI, Hugging Face Diffusers, and various APIs, and is compatible with quantization techniques for low-VRAM deployment. Stable Diffusion 3.5 Large is trained on filtered public and synthetic datasets, with a focus on aesthetic quality and prompt adherence. It is released under the Stability AI Community License, free for non-commercial use by entities with under $1M in annual revenue. Safety mitigations have been implemented during training, but developers are advised to conduct their own testing.
Features
- Multimodal Diffusion Transformer with three pretrained encoders
- High-quality image generation with enhanced prompt adherence
- QK-normalization for stable and efficient training
- Supports quantization and low-VRAM inference with BitsAndBytes
- Improved typography and complex prompt rendering
- Fine-tuning support and structured API access
- Community license with free use for qualifying users
- Safety and integrity evaluations for responsible deployment