ControlNet is a neural network architecture that enhances Stable Diffusion by enabling image generation conditioned on specific visual structures such as edges, poses, depth maps, and segmentation masks. By injecting these auxiliary inputs into the diffusion process, ControlNet gives users powerful control over the layout and composition of generated images while preserving the style and flexibility of generative models. It supports a wide range of conditioning types through pretrained modules, including Canny edges, HED (soft edges), Midas depth, OpenPose skeletons, normal maps, MLSD lines, scribbles, and ADE20k-based semantic segmentation. The system includes both ControlNet+SD1.5 model weights and compatible third-party detectors like Midas and OpenPose to extract input features. Each conditioning type is matched with a specific .pth model file to be used alongside Stable Diffusion for fine-grained control.
Features
- Extends Stable Diffusion with image structure-based control
- Supports conditioning via edges, depth, pose, scribbles, and more
- Includes pretrained weights for multiple input types (e.g., Canny, Midas, OpenPose)
- Allows precise manipulation of image composition and layout
- Compatible with AUTOMATIC1111 Web UI and Hugging Face demos
- Enables sketch-to-image, pose-to-image, and segmentation-guided generation
- Includes training resources and detection models for setup
- Released under OpenRAIL license to guide ethical use