Browse free open source Python AI Image Generators and projects below. Use the toggles on the left to filter open source Python AI Image Generators by OS, license, language, programming language, and project status.

  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 1
    ComfyUI

    ComfyUI

    The most powerful and modular diffusion model GUI, api and backend

    The most powerful and modular diffusion model is GUI and backend. This UI will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart-based interface. We are a team dedicated to iterating and improving ComfyUI, supporting the ComfyUI ecosystem with tools like node manager, node registry, cli, automated testing, and public documentation. Open source AI models will win in the long run against closed models and we are only at the beginning. Our core mission is to advance and democratize AI tooling. We believe that the future of AI tooling is open-source and community-driven.
    Downloads: 103 This Week
    Last Update:
    See Project
  • 2
    Fooocus

    Fooocus

    Focus on prompting and generating

    Fooocus is an open-source image generation software that simplifies the process of creating images from text prompts. Built on Gradio and leveraging Stable Diffusion XL, Fooocus eliminates the need for manual parameter tweaking, allowing users to focus solely on crafting prompts. It offers a user-friendly interface with minimal setup, making advanced image synthesis accessible to a broader audience.
    Downloads: 101 This Week
    Last Update:
    See Project
  • 3
    AUTOMATIC1111 Stable Diffusion web UI
    AUTOMATIC1111's stable-diffusion-webui is a powerful, user-friendly web interface built on the Gradio library that allows users to easily interact with Stable Diffusion models for AI-powered image generation. Supporting both text-to-image (txt2img) and image-to-image (img2img) generation, this open-source UI offers a rich feature set including inpainting, outpainting, attention control, and multiple advanced upscaling options. With a flexible installation process across Windows, Linux, and Apple Silicon, plus support for GPUs and CPUs, it caters to a wide range of users—from hobbyists to professionals. The interface also supports prompt editing, batch processing, custom scripts, and many community extensions, making it a highly customizable and continually evolving platform for creative AI art generation.
    Downloads: 34 This Week
    Last Update:
    See Project
  • 4
    InvokeAI

    InvokeAI

    InvokeAI is a leading creative engine for Stable Diffusion models

    InvokeAI is an implementation of Stable Diffusion, the open source text-to-image and image-to-image generator. It provides a streamlined process with various new features and options to aid the image generation process. It runs on Windows, Mac and Linux machines, and runs on GPU cards with as little as 4 GB or RAM. InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products. This fork is supported across Linux, Windows and Macintosh. Linux users can use either an Nvidia-based card (with CUDA support) or an AMD card (using the ROCm driver). We do not recommend the GTX 1650 or 1660 series video cards. They are unable to run in half-precision mode and do not have sufficient VRAM to render 512x512 images.
    Downloads: 26 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    DALL·E Mini

    DALL·E Mini

    Generate images from a text prompt

    DALL·E Mini, generate images from a text prompt. Craiyon/DALL·E mini is an attempt at reproducing those results with an open-source model. The model is trained by looking at millions of images from the internet with their associated captions. Over time, it learns how to draw an image from a text prompt. Some concepts are learned from memory as they may have seen similar images. However, it can also learn how to create unique images that don't exist, such as "the Eiffel tower is landing on the moon," by combining multiple concepts together. Optimizer updated to Distributed Shampoo, which proved to be more efficient following comparison of different optimizers. New architecture based on NormFormer and GLU variants following comparison of transformer variants, including DeepNet, Swin v2, NormFormer, Sandwich-LN, RMSNorm with GeLU/Swish/SmeLU.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 6
    Stable Diffusion in Docker

    Stable Diffusion in Docker

    Run the Stable Diffusion releases in a Docker container

    Run the Stable Diffusion releases in a Docker container with txt2img, img2img, depth2img, pix2pix, upscale4x, and inpaint. Run the Stable Diffusion releases on Huggingface in a GPU-accelerated Docker container. By default, the pipeline uses the full model and weights which requires a CUDA capable GPU with 8GB+ of VRAM. It should take a few seconds to create one image. On less powerful GPUs you may need to modify some of the options; see the Examples section for more details. If you lack a suitable GPU you can set the options --device cpu and --onnx instead. Since it uses the model, you will need to create a user access token in your Huggingface account. Save the user access token in a file called token.txt and make sure it is available when building the container. Create an image from an existing image and a text prompt. Modify an existing image with its depth map and a text prompt.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 7
    Qwen-Image

    Qwen-Image

    Qwen-Image is a powerful image generation foundation model

    Qwen-Image is a powerful 20-billion parameter foundation model designed for advanced image generation and precise editing, with a particular strength in complex text rendering across diverse languages, especially Chinese. Built on the MMDiT architecture, it achieves remarkable fidelity in integrating text seamlessly into images while preserving typographic details and layout coherence. The model excels not only in text rendering but also in a wide range of artistic styles, including photorealistic, impressionist, anime, and minimalist aesthetics. Qwen-Image supports sophisticated editing tasks such as style transfer, object insertion and removal, detail enhancement, and even human pose manipulation, making it suitable for both professional and casual users. It also includes advanced image understanding capabilities like object detection, semantic segmentation, depth and edge estimation, and novel view synthesis.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    KoboldCpp

    KoboldCpp

    Run GGUF models easily with a UI or API. One File. Zero Install.

    KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable that builds off llama.cpp and adds many additional powerful features.
    Leader badge
    Downloads: 192 This Week
    Last Update:
    See Project
  • 9
    Stable Diffusion

    Stable Diffusion

    High-Resolution Image Synthesis with Latent Diffusion Models

    Stable Diffusion Version 2. The Stable Diffusion project, developed by Stability AI, is a cutting-edge image synthesis model that utilizes latent diffusion techniques for high-resolution image generation. It offers an advanced method of generating images based on text input, making it highly flexible for various creative applications. The repository contains pretrained models, various checkpoints, and tools to facilitate image generation tasks, such as fine-tuning and modifying the models. Stability AI's approach to image synthesis has contributed to creating detailed, scalable images while maintaining efficiency.
    Downloads: 65 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    CogView

    CogView

    Text-to-Image generation. The repo for NeurIPS 2021 paper

    CogView is a large-scale pretrained text-to-image transformer model, introduced in the NeurIPS 2021 paper CogView: Mastering Text-to-Image Generation via Transformers. With 4 billion parameters, it was one of the earliest transformer-based models to successfully generate high-quality images from natural language descriptions in Chinese, with partial support for English via translation. The model incorporates innovations such as PB-relax and Sandwich-LN to enable stable training of very deep transformers without NaN loss issues. CogView supports multiple tasks beyond text-to-image, including image captioning, post-selection (ranking candidate images by relevance to a prompt), and super-resolution (upscaling model-generated images). The repository provides pretrained models, inference scripts, and training examples, along with a Docker environment for reproducibility.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    Dream Textures

    Dream Textures

    Stable Diffusion built-in to Blender

    Create textures, concept art, background assets, and more with a simple text prompt. Use the 'Seamless' option to create textures that tile perfectly with no visible seam. Texture entire scenes with 'Project Dream Texture' and depth to image. Re-style animations with the Cycles render pass. Run the models on your machine to iterate without slowdowns from a service. Create textures, concept art, and more with text prompts. Learn how to use the various configuration options to get exactly what you're looking for. Texture entire models and scenes with depth to image. Inpaint to fix up images and convert existing textures into seamless ones automatically. Outpaint to increase the size of an image by extending it in any direction. Perform style transfer and create novel animations with Stable Diffusion as a post processing step. Dream Textures has been tested with CUDA and Apple Silicon GPUs. Over 4GB of VRAM is recommended.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    HunyuanImage-3.0

    HunyuanImage-3.0

    A Powerful Native Multimodal Model for Image Generation

    HunyuanImage-3.0 is a powerful, native multimodal text-to-image generation model released by Tencent’s Hunyuan team. It unifies multimodal understanding and generation in a single autoregressive framework, combining text and image modalities seamlessly rather than relying on separate image-only diffusion components. It uses a Mixture-of-Experts (MoE) architecture with many expert subnetworks to scale efficiently, deploying only a subset of experts per token, which allows large parameter counts without linear inference cost explosion. The model is intended to be competitive with closed-source image generation systems, aiming for high fidelity, prompt adherence, fine detail, and even “world knowledge” reasoning (i.e. leveraging context, semantics, or common sense in generation). The GitHub repo includes code, scripts, model loading instructions, inference utilities, prompt handling, and integration with standard ML tooling (e.g. Hugging Face / Transformers).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    Stable Diffusion WebUI

    Stable Diffusion WebUI

    Web interface for generating images using Stable Diffusion models

    This project provides a powerful web-based interface for running Stable Diffusion, a text-to-image generation model. Developed by AUTOMATIC1111, it supports numerous features like model customization, prompt history, image upscaling, inpainting, and batch processing. The WebUI is beginner-friendly yet powerful enough for advanced users, becoming one of the most popular community-run UIs for AI image generation.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    Diffusers

    Diffusers

    State-of-the-art diffusion models for image and audio generation

    Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, Diffusers is a modular toolbox that supports both. Our library is designed with a focus on usability over performance, simple over easy, and customizability over abstractions. State-of-the-art diffusion pipelines that can be run in inference with just a few lines of code. Interchangeable noise schedulers for different diffusion speeds and output quality. Pretrained models that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems. We recommend installing Diffusers in a virtual environment from PyPi or Conda. For more details about installing PyTorch and Flax, please refer to their official documentation.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    Hunyuan3D-1

    Hunyuan3D-1

    A Unified Framework for Text-to-3D and Image-to-3D Generation

    Hunyuan3D-1 is an earlier version in the same 3D generation line (the unified framework for text-to-3D and image-to-3D tasks) by Tencent Hunyuan. It provides a framework combining shape generation and texture synthesis, enabling users to create 3D assets from images or text conditions. While less advanced than version 2.1, it laid the foundations for the later PBR, higher resolution, and open-source enhancements. (Note: less detailed public documentation was found for Hunyuan3D-1 compared to 2.1.). Community and ecosystem support (e.g. usage via Blender addon for geometry/texture). Integration into user-friendly tools/platforms.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    texturize

    texturize

    Generate photo-realistic textures based on source images

    Generate photo-realistic textures based on source images. Remix, remake, mashup! Useful if you want to create variations on a theme or elaborate on an existing texture. A command-line tool and Python library to automatically generate new textures similar to a source image or photograph. It's useful in the context of computer graphics if you want to make variations on a theme or expand the size of an existing texture. This software is powered by deep learning technology, using a combination of convolution networks and example-based optimization to synthesize images. We're building texturize as the highest-quality open source library available! The examples are available as notebooks, and you can run them directly in-browser thanks to Jupyter and Google Colab.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    ChatFred

    ChatFred

    Alfred workflow using ChatGPT, DALL·E 2 and other models for chatting

    Alfred workflow using ChatGPT, DALL·E 2 and other models for chatting, image generation and more. Access ChatGPT, DALL·E 2, and other OpenAI models. Language models often give wrong information. Verify answers if they are important. Talk with ChatGPT via the cf keyword. Answers will show as Large Type. Alternatively, use the Universal Action, Fallback Search, or Hotkey. To generate text with InstructGPT models and see results in-line, use the cft keyword. ⤓ Install on the Alfred Gallery or download it over GitHub and add your OpenAI API key. If you have used ChatGPT or DALL·E 2, you already have an OpenAI account. Otherwise, you can sign up here - You will receive $5 in free credit, no payment data is required. Afterward you can create your API key. To start a conversation with ChatGPT either use the keyword cf, setup the workflow as a fallback search in Alfred or create your custom hotkey to directly send the clipboard content to ChatGPT.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    DALL-E 2 - Pytorch

    DALL-E 2 - Pytorch

    Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis

    Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch. The main novelty seems to be an extra layer of indirection with the prior network (whether it is an autoregressive transformer or a diffusion network), which predicts an image embedding based on the text embedding from CLIP. Specifically, this repository will only build out the diffusion prior network, as it is the best performing variant (but which incidentally involves a causal transformer as the denoising network) To train DALLE-2 is a 3 step process, with the training of CLIP being the most important. To train CLIP, you can either use x-clip package, or join the LAION discord, where a lot of replication efforts are already underway. Then, you will need to train the decoder, which learns to generate images based on the image embedding coming from the trained CLIP.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    CLIP Guided Diffusion

    CLIP Guided Diffusion

    A CLI tool/python module for generating images from text

    A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI. Text to image generation (multiple prompts with weights). Non-square Generations (experimental) Generate portrait or landscape images by specifying a number to offset the width and/or height. Uses fewer timesteps over the same diffusion schedule. Sacrifices accuracy/alignment for quicker runtime. options: - 25, 50, 150, 250, 500, 1000, ddim25,ddim50,ddim150, ddim250,ddim500,ddim1000 (default: 1000) Prepending a number with ddim will use the ddim scheduler. e.g. ddim25 will use the 25 timstep ddim scheduler. This method may be better at shorter timestep_respacing values. Multiple prompts can be specified with the | character. You may optionally specify a weight for each prompt.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Deep Daze

    Deep Daze

    Simple command line tool for text to image generation

    Simple command-line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). In true deep learning fashion, more layers will yield better results. Default is at 16, but can be increased to 32 depending on your resources. Technique first devised and shared by Mario Klingemann, it allows you to prime the generator network with a starting image, before being steered towards the text. Simply specify the path to the image you wish to use, and optionally the number of initial training steps. We can also feed in an image as an optimization goal, instead of only priming the generator network. Deepdaze will then render its own interpretation of that image. The regular mode for texts only allows 77 tokens. If you want to visualize a full story/paragraph/song/poem, set create_story to True.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    GLIDE (Text2Im)

    GLIDE (Text2Im)

    GLIDE: a diffusion-based text-conditional image synthesis model

    glide-text2im is an open source implementation of OpenAI’s GLIDE model, which generates photorealistic images from natural language text prompts. It demonstrates how diffusion-based generative models can be conditioned on text to produce highly detailed and coherent visual outputs. The repository provides both model code and pretrained checkpoints, making it possible for researchers and developers to experiment with text-to-image synthesis. GLIDE includes advanced techniques such as classifier-free guidance, which improves the quality and alignment of generated images with the input text. The project also offers sampling scripts and utilities for exploring how diffusion models can be applied to multimodal tasks. As one of the early diffusion-based text-to-image systems, glide-text2im laid important groundwork for later advances in generative AI research.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    ImageReward

    ImageReward

    [NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences

    ImageReward is the first general-purpose human preference reward model (RM) designed for evaluating text-to-image generation, introduced alongside the NeurIPS 2023 paper ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation. Trained on 137k expert-annotated image pairs, ImageReward significantly outperforms existing scoring methods like CLIP, Aesthetic, and BLIP in capturing human visual preferences. It is provided as a Python package (image-reward) that enables quick scoring of generated images against textual prompts, with APIs for ranking, scoring, and filtering outputs. Beyond evaluation, ImageReward supports Reward Feedback Learning (ReFL), a method for directly fine-tuning diffusion models such as Stable Diffusion using human-preference feedback, leading to demonstrable improvements in image quality.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Stable-Dreamfusion

    Stable-Dreamfusion

    Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion

    A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. This project is a work-in-progress and contains lots of differences from the paper. The current generation quality cannot match the results from the original paper, and many prompts still fail badly! Since the Imagen model is not publicly available, we use Stable Diffusion to replace it (implementation from diffusers). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time costs in training. We use the multi-resolution grid encoder to implement the NeRF backbone (implementation from torch-ngp), which enables much faster rendering.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    x-unet

    x-unet

    Implementation of a U-net complete with efficient attention

    Implementation of a U-net complete with efficient attention as well as the latest research findings. For 3d (video or CT / MRI scans).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Disco Diffusion

    Disco Diffusion

    Notebooks, models and techniques for the generation of AI Art

    A frankensteinian amalgamation of notebooks, models, and techniques for the generation of AI art and animations. This project uses a special conversion tool to convert the Python files into notebooks for easier development. What this means is you do not have to touch the notebook directly to make changes to it. The tool being used is called Colab-Convert. Initial QoL improvements added, including user-friendly UI, settings+prompt saving, and improved google drive folder organization. Now includes sizing options, intermediate saves and fixed image prompts and Perlin inits. the unexposed batch option since it doesn't work.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.