Stable Diffusion Inpainting is a powerful text-to-image latent diffusion model designed specifically for inpainting tasks, allowing users to modify or regenerate parts of images using text prompts and masks. Based on the Stable Diffusion v1.2 architecture, it was further fine-tuned with 440k steps of inpainting-specific training on the LAION-Aesthetics v2 5+ dataset. The model takes an image, a binary mask, and a descriptive prompt to realistically fill in masked regions while keeping the surrounding content intact. Its UNet architecture was adapted with 5 additional input channels to handle encoded masked images and masks. The model supports use through the Hugging Face diffusers library and tools like AUTOMATIC1111, offering accessible integration. Though highly capable, the model retains the original limitations of Stable Diffusion, such as struggles with text rendering, compositional logic, and demographic bias.
Features
- Text-guided inpainting with high-quality image reconstruction
- Fine-tuned from Stable Diffusion v1.2 with added mask-handling layers
- Accepts user-provided images and masks in conjunction with prompts
- Trained on LAION-Aesthetics v2 5+ dataset for improved visual coherence
- Compatible with Hugging Face diffusers and AUTOMATIC1111 tools
- Uses classifier-free guidance with 10% text dropout to boost flexibility
- Performs well on FID and LPIPS benchmarks for inpainting quality
- Released under CreativeML OpenRAIL-M license for responsible use