DreamO is a unified, open-source framework from ByteDance for advanced image customization and generation that consolidates multiple “image manipulation” tasks into a single system, rather than requiring separate specialized models. Built on a diffusion-transformer (DiT) backbone, it supports a diverse set of tasks — including identity preservation, virtual “try-on” (e.g. clothing, accessories), style transfer, IP adaptation (objects/characters), and layout/condition-aware customizations — all handled within the same unified architecture. DreamO’s design introduces a feature routing constraint that helps disentangle different control conditions (like identity, style, clothing) when more than one is specified, which significantly reduces conflicts and artifacts when combining controls. It also uses a “placeholder strategy” to precisely align conditional inputs (e.g. where to place clothing or objects) in generated images, giving users fine-grained control over composition.
Features
- Unified framework combining identity preservation, virtual try-on, style transfer, IP adaptation, and conditional image generation
- Diffusion-transformer (DiT) backbone with feature-routing constraints to avoid entanglement when mixing multiple controls
- Placeholder-based conditioning enabling precise placement/control of objects, clothes, or layout elements in generated images
- Progressive training strategy (warm-up → full multitask → quality alignment) for robust output quality across tasks
- Quantization and inference optimizations (e.g. int8, Nunchaku quantization, GPU offload) to support consumer-grade GPUs
- Open-source under Apache-2.0 license with code, pretrained models, and inference pipeline — accessible for customization or local use