45 projects for "generate image" with 2 filters applied:

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    FLUX.1

    FLUX.1

    Official inference repo for FLUX.1 models

    FLUX.1 repository contains inference code and tooling for the FLUX.1 text-to-image diffusion models, enabling developers and researchers to generate and edit images from natural-language prompts using open-weight versions of the model on their own hardware or within custom applications. The project is part of a larger family of FLUX models developed by Black Forest Labs, designed to produce high-quality, detailed visuals from text descriptions with competitive prompt adherence and artistic fidelity. ...
    Downloads: 45 This Week
    Last Update:
    See Project
  • 2
    Text-to-image Playground

    Text-to-image Playground

    A playground to generate images from any text prompt using SD

    dalle-playground is an open-source web application that allows users to generate images from natural language text prompts using modern text-to-image generative models. Originally built around DALL-E Mini, the project later transitioned to using Stable Diffusion, enabling more detailed and higher-quality image synthesis. The system combines a backend machine learning service with a browser-based frontend interface that lets users experiment interactively with prompt engineering and generative AI. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    SwarmUI

    SwarmUI

    Modular AI image and video generation web UI with extensible tools

    SwarmUI is a modular web-based user interface designed for AI-driven image generation, with a strong focus on usability, performance, and extensibility. It serves as a unified environment for working with multiple AI models, including Stable Diffusion and newer image and video generation systems, allowing users to create and manage outputs through a browser interface. SwarmUI is built to accommodate both beginners and advanced users by offering a simple “Generate” interface alongside more advanced workflow tools that expose deeper configuration options. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 4
    LTX-2.3

    LTX-2.3

    Official Python inference and LoRA trainer package

    LTX-2.3 is an open-source multimodal artificial intelligence foundation model developed by Lightricks for generating synchronized video and audio from prompts or other inputs. Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while...
    Downloads: 108 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 5
    Watermark-Removal

    Watermark-Removal

    Machine learning image inpainting task that removes watermarks

    ...Through these techniques, the model learns to identify regions of the image affected by the watermark and generate realistic replacements for the missing visual information. The repository contains code for preprocessing images, training the model, and running inference on images to automatically remove watermark artifacts.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    HunyuanImage-3.0

    HunyuanImage-3.0

    A Powerful Native Multimodal Model for Image Generation

    HunyuanImage-3.0 is a powerful, native multimodal text-to-image generation model released by Tencent’s Hunyuan team. It unifies multimodal understanding and generation in a single autoregressive framework, combining text and image modalities seamlessly rather than relying on separate image-only diffusion components. It uses a Mixture-of-Experts (MoE) architecture with many expert subnetworks to scale efficiently, deploying only a subset of experts per token, which allows large parameter...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 7
    SAM 3D Objects

    SAM 3D Objects

    Models for object and human mesh reconstruction

    SAM 3D Objects is a foundation model that reconstructs full 3D geometry, texture, and spatial layout of objects and scenes from a single image. Given one RGB image and object masks (for example, from the Segment Anything family), it can generate a textured 3D mesh for each object, including pose and approximate scene layout. The model is specifically designed to be robust in real-world images with clutter, occlusions, small objects, and unusual viewpoints, where many earlier 3D-from-image systems struggle. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    DeepSeek VL2

    DeepSeek VL2

    Mixture-of-Experts Vision-Language Models for Advanced Multimodal

    DeepSeek-VL2 is DeepSeek’s vision + language multimodal model—essentially the next-gen successor to their first vision-language models. It combines image and text inputs into a unified embedding / reasoning space so that you can query with text and image jointly (e.g. “What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to process visual inputs as context for downstream tasks. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Guizang Social Card Skill

    Guizang Social Card Skill

    Claude Code / Codex skill — generate Xiaohongshu carousels

    Guizang Social Card Skill is an AI-agent skill for generating polished social image packages in a Guizang-inspired visual style. It is designed for formats such as Xiaohongshu or Rednote carousels, WeChat Official Account covers, article covers, product update graphics, thumbnails, and screenshot-heavy posts. The skill turns articles, scripts, screenshots, product notes, subtitles, or photos into structured social card outputs. It supports editorial magazine layouts and Swiss-style visual...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    HunyuanCustom

    HunyuanCustom

    Multimodal-Driven Architecture for Customized Video Generation

    HunyuanCustom is a multimodal video customization framework by Tencent Hunyuan, aimed at generating customized videos featuring particular subjects (people, characters) under flexible conditions, while maintaining subject/identity consistency. It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images. The architecture builds on HunyuanVideo, with added modules for identity reinforcement and modality-specific condition injection. Text-image fusion module based on LLaVA for improved multimodal understanding. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    InfiniteYou

    InfiniteYou

    Flexible Photo Recrafting While Preserving Your Identity

    InfiniteYou is an open-source image-generation and “identity-preserving image editing / generation” framework from ByteDance, designed to generate high-fidelity images that preserve a subject’s identity while allowing flexible editing or re-creation according to textual prompts. Using an architecture built around diffusion transformers (DiTs), InfiniteYou introduces a component called InfuseNet that injects identity features derived from reference images into the generation process — via residual connections — so that the output matches the person’s identity closely, without sacrificing visual quality or text-image alignment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Story Flicks

    Story Flicks

    Generate high-definition story short videos with one click using AI

    ...Because the project is open and modifiable, developers can customize the generation pipeline: adjust story structure, alter rendering parameters, tweak video quality or resolution, or integrate with other AI models (e.g. for audio, voice-over, or image-to-video). It’s especially useful as a starting template or experimentation ground for developers building automated content-creation tools.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    InternLM-XComposer-2.5

    InternLM-XComposer-2.5

    InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System

    InternLM-XComposer is an open-source multimodal AI system designed to generate long-form content that combines text with visual elements such as images and diagrams. The model is built on top of the InternLM language model architecture and extends its capabilities to handle multimodal inputs and outputs. Instead of producing only textual responses, the system can generate visually enriched documents such as illustrated articles, presentations, and educational materials. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    AI Logo Generator

    AI Logo Generator

    A free + OSS logo generator powered by Flux on Together AI

    AI Logo Generator is an open-source AI logo generator that lets you create professional-looking logos in seconds from a simple text prompt. It uses the Flux Pro 1.1 model hosted on Together AI to generate logos, so the heavy lifting is done by a state-of-the-art image model while the app focuses on UX and workflow. The project is built with Next.js and TypeScript, and it uses shadcn/ui plus Tailwind CSS for a modern, responsive interface that feels like a polished SaaS product rather than a demo. It integrates Clerk for authentication so users can sign in, save their logo history (planned via a dashboard), and potentially manage usage tied to their own API key. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    HivisionIDPhoto

    HivisionIDPhoto

    HivisionIDPhotos: a lightweight and efficient AI ID photos tools

    ...It also allows the generation of layout sheets such as six-inch photo arrangements for printing multiple ID photos on a single page. The project focuses on building a practical pipeline for automated ID photo production using AI-based segmentation and image processing techniques.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    vim-ai

    vim-ai

    AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim

    vim-ai is an AI-powered assistant plugin for Vim and Neovim that brings language-model features directly into the editor. It allows users to generate code or text, edit selections in place, and carry on interactive chat-style conversations without leaving the terminal editing environment. The plugin is built around OpenAI-compatible APIs, which means it can work not only with OpenAI itself but also with compatible proxies and alternative providers. Its command set covers text completion, editing, chat continuation, image generation, and debugging utilities, making it more versatile than a narrow autocomplete add-on. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    ALLWEONE

    ALLWEONE

    AI tool that generates custom presentations with real-time editing

    Presentation AI by ALLWEONE is an open source tool that uses artificial intelligence to generate complete slide decks from a simple prompt. It helps users create professional presentations quickly, with support for customizable themes, layouts, and styles. You can define slide count, language, and tone, then review or edit the AI-generated outline before finalising. Slides are built in real time, allowing you to watch content develop as the system works.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Janus

    Janus

    Unified Multimodal Understanding and Generation Models

    Janus is a sophisticated open-source project from DeepSeek AI that aims to unify both visual understanding and image generation in a single model architecture. Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    LandPPT

    LandPPT

    An LLM-based presentation generation platform

    LandPPT is an open-source AI platform that automatically generates professional presentation slides using large language models. The system allows users to create complete PowerPoint presentations simply by entering a topic or uploading source documents such as PDFs, Word files, or Markdown notes. Using natural language processing and structured content generation, the platform produces presentation outlines and converts them into fully formatted slide decks. The application integrates...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Framelink MCP for Figma

    Framelink MCP for Figma

    MCP server enabling AI coding tools to access Figma design data

    Figma-Context-MCP is an open source server that connects Figma design data with AI-powered coding tools through the Model Context Protocol (MCP). It allows coding assistants to retrieve structured information from Figma files so they can better translate visual designs into working code. Instead of relying on screenshots or manual descriptions, Figma-Context-MCP accesses layout, styling, and component metadata directly from the Figma API and presents it in a simplified format optimized for...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    WorldGen

    WorldGen

    Generate Any 3D Scene in Seconds

    WorldGen is an AI model and library that can generate full 3D scenes in a matter of seconds from either text prompts or reference images. It is designed to create interactive environments suitable for games, simulations, robotics research, and virtual reality, rather than just static 3D assets. The core idea is that you describe a world in natural language and WorldGen produces a navigable 3D scene that you can freely explore in 360 degrees, with loop closure so that the space remains...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    draw-a-ui

    draw-a-ui

    Draw wireframe sketches and generate HTML with AI vision models

    draw-a-ui is an experimental open source application that converts hand-drawn interface wireframes into working HTML code using artificial intelligence. draw-a-ui combines the tldraw canvas drawing tool with a vision-capable language model to interpret user-created mockups and translate them into a single HTML layout styled with Tailwind CSS. When a user sketches a UI on the canvas, the application captures the current drawing as SVG, converts it into a PNG image, and sends that image to a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    DreamCraft3D is DeepSeek’s generative 3D modeling framework / model family that likely extends their earlier 3D efforts (e.g. Shap-E or Point-E style models) with more capability, control, or expression. The name suggests a “dream crafting” metaphor—users probably supply textual or image prompts and generate 3D assets (point clouds, meshes, scenes). The repository includes model code, inference scripts, sample prompts, and possibly dataset preparation pipelines. It may integrate rendering or post-processing modules (e.g. mesh smoothing, texturing) to make the outputs more output-ready. Because 3D generation is hardware‐intensive, the repository likely also includes optimizations like quantization, pruning, or inference accelerations (e.g. using FlashMLA or DeepEP) to make the generation pipeline faster or more efficient. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    HY-World 1.5

    HY-World 1.5

    A Systematic Framework for Interactive World Modeling

    HY-WorldPlay is a Hunyuan AI project focusing on immersive multimodal content generation and interaction within virtual worlds or simulated environments. It aims to empower AI agents with the capability to both understand and generate multimedia content — including text, audio, image, and potentially 3D or game-world elements — enabling lifelike dialogue, environmental interpretations, and responsive world behavior. The platform targets use cases in digital entertainment, game worlds, training simulators, and interactive storytelling, where AI agents need to adapt to real-time user inputs and changes in environment state. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    HunyuanWorld-Voyager

    HunyuanWorld-Voyager

    RGBD video generation model conditioned on camera input

    HunyuanWorld-Voyager is a next-generation video diffusion framework developed by Tencent-Hunyuan for generating world-consistent 3D scene videos from a single input image. By leveraging user-defined camera paths, it enables immersive scene exploration and supports controllable video synthesis with high realism. The system jointly produces aligned RGB and depth video sequences, making it directly applicable to 3D reconstruction tasks. At its core, Voyager integrates a world-consistent video...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo