sg2im is a research codebase that learns to synthesize images from scene graphs—structured descriptions of objects and their relationships. Instead of conditioning on free-form text alone, it leverages graph structure to control layout and interactions, generating scenes that respect constraints like “person left of dog” or “cup on table.” The pipeline typically predicts object layouts (bounding boxes and masks) from the graph, then renders a realistic image conditioned on those layouts. This separation lets the model reason about geometry and composition before committing to texture and color, improving spatial fidelity. The repository includes training code, datasets, and evaluation scripts so researchers can reproduce baselines and extend components such as the graph encoder or image generator. In practice, sg2im demonstrates how structured semantics can guide generative models to produce controllable, compositional imagery.
Features
- Graph encoder that maps objects and relations to spatial layouts
- Layout-to-image generator for realistic rendering from structure
- Support for constraints like left-of, on-top-of, and inside relationships
- Modular components for swapping encoders, decoders, and losses
- Training and evaluation scripts for reproducible experiments
- Useful baselines for controllable, compositional image synthesis