Stable Diffusion v1.4 is a latent diffusion model that generates images from text, trained at 512×512 resolution using the LAION-Aesthetics v2.5+ dataset. Built on the weights of v1.2, it uses a CLIP ViT-L/14 encoder to guide image generation through cross-attention mechanisms. It supports classifier-free guidance by dropping 10% of text conditioning during training, enhancing creative control. The model runs efficiently while producing visually coherent and high-quality results, though it struggles with compositional prompts, fine details, and photorealistic faces. Stable Diffusion v1.4 primarily supports English and may underperform in other languages. It is licensed under CreativeML OpenRAIL-M and is intended for research and creative use, not for generating factual or identity-representative content. Developers emphasize safety, bias awareness, and the importance of responsible deployment due to its training on unfiltered web data.
Features
- Text-to-image generation at 512×512 resolution
- Fine-tuned on LAION-Aesthetics v2.5+ dataset
- CLIP ViT-L/14-based text encoding
- Classifier-free guidance for greater prompt control
- Pretrained latent autoencoder with lossy compression
- Optimized for English-language prompts
- Fast sampling using PLMS (Pseudo Linear Multistep)
- CreativeML OpenRAIL-M license for responsible AI use