OmniGen2

OmniGen2 is a powerful, efficient open-source multimodal generation model designed for diverse AI tasks involving both images and text. It improves on its predecessor by introducing separate decoding pathways for text and image, along with unshared parameters and a decoupled image tokenizer, enhancing flexibility and performance. Built on a strong Qwen-VL-2.5 foundation, OmniGen2 excels in visual understanding, high-quality text-to-image generation, and instruction-guided image editing. It also supports in-context generation, enabling the combination of multiple inputs like humans, objects, and scenes to produce novel, coherent visuals. The project offers ready-to-use models, extensive demos via Gradio, and supports resource-efficient features like CPU offloading to accommodate limited VRAM devices. Users can fine-tune generation results with hyperparameters like text and image guidance scales, maximum image resolution, and negative prompts.

Features

Unified multimodal model with distinct decoding paths for text and images
Based on Qwen-VL-2.5 for strong visual understanding
Generates high-fidelity images from text prompts with fine control
Instruction-guided image editing for precise modifications
Supports in-context generation combining diverse inputs into coherent outputs
Resource-efficient with CPU offload options for devices with limited VRAM
Comprehensive Gradio demos and example scripts for quick experimentation
Open-source under Apache 2.0 license with training code and data forthcoming

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow OmniGen2

OmniGen2 Web Site

Other Useful Business Software

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Rate This Project

User Reviews

Be the first to post a review of OmniGen2!

Additional Project Details

Registered

2025-06-30

Similar Business Software

Llama 4 Scout

Llama 4 Scout is a powerful 17 billion active parameter multimodal AI model that excels in both text and image processing. With an industry-leading context length of 10 million tokens, it outperforms its predecessors, including Llama 3, in tasks such as multi-document summarization and parsing...

See Software
Janus-Pro-7B

Janus-Pro-7B is an innovative open-source multimodal AI model from DeepSeek, designed to excel in both understanding and generating content across text, images, and videos. It leverages a unique autoregressive architecture with separate pathways for visual encoding, enabling high performance in...

See Software
Grok 3

Grok-3, developed by xAI, represents a significant advancement in the field of artificial intelligence, aiming to set new benchmarks in AI capabilities. It is designed to be a multimodal AI, capable of processing and understanding data from various sources including text, images, and audio,...

See Software