Showing 523 open source projects for "images"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    Mochi Diffusion

    Mochi Diffusion

    Run Stable Diffusion on Mac natively

    ...This app uses Apple's Core ML Stable Diffusion implementation to achieve maximum performance and speed on Apple Silicon based Macs while reducing memory requirements. Extremely fast and memory efficient (~150MB with Neural Engine) Runs well on all Apple Silicon Macs by fully utilizing Neural Engine. Generate images locally and completely offline. Generate images based on an existing image (commonly known as Image2Image) Generated images are saved with prompt info inside EXIF metadata (view in Finder's Get Info window) Convert generated images to high resolution (using RealESRGAN) Autosave & restore images. Use custom Stable Diffusion Core ML models. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 2
    Upscayl

    Upscayl

    Free and Open Source AI Image Upscaler for Linux, MacOS and Windows

    ...This means that we prioritize Linux builds over others but that doesn't mean we'll break things for other OSes. Upscayl does not work without a GPU, sorry. You'll need a Vulkan-compatible GPU to upscale images. CPU or iGPU won't work. You can also download the flatpak version and double-click the flatpak file to install via Store but wait for the full release, we'll be pushing it to Flathub for easy access. Upscayl uses AI models to enhance your images by guessing what the details could be. It uses Real-ESRGAN (and more in the future) model to achieve this. ...
    Downloads: 176 This Week
    Last Update:
    See Project
  • 3
    NSFWJS

    NSFWJS

    Client-side indecent content checking powered by TensorFlow.js

    NSFWJS is a simple JavaScript library that can quickly and quite accurately identify NSFW images, all in the client's browser. It is powered by TensorFlow.js and the NSFW detection model, and delivers around 90% accuracy that is improving each time. NSFWJS classifies images with percentages under five categories, namely: drawing and neutral, which are both safe for work; sexy, which includes sexually explicit images; and hentai and porn, which are pornographic drawings and images. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    FLUX.2

    FLUX.2

    Official inference repo for FLUX.2 models

    FLUX.2 is a state-of-the-art open-weight image generation and editing model released by Black Forest Labs aimed at bridging the gap between research-grade capabilities and production-ready workflows. The model offers both text-to-image generation and powerful image editing, including editing of multiple reference images, with fidelity, consistency, and realism that push the limits of what open-source generative models have achieved. It supports high-resolution output (up to ~4 megapixels), which allows for photography-quality images, detailed product shots, infographics or UI mockups rather than just low-resolution drafts. FLUX.2 is built with a modern architecture (a flow-matching transformer + a revamped VAE + a strong vision-language encoder), enabling strong prompt adherence, correct rendering of text/typography in images, reliable lighting, layout, and physical realism, and consistent style/character/product identity across multiple generations or edits.
    Downloads: 32 This Week
    Last Update:
    See Project
  • 99.99% Uptime for MySQL and PostgreSQL Databases Icon
    99.99% Uptime for MySQL and PostgreSQL Databases

    Sub-second maintenance. 2x read/write performance. Built-in vector search for AI apps.

    Cloud SQL Enterprise Plus delivers near-zero downtime with 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server.
    Try Free
  • 5
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 100 This Week
    Last Update:
    See Project
  • 6
    StarVector

    StarVector

    StarVector is a foundation model for SVG generation

    StarVector is a multimodal foundation model designed for generating Scalable Vector Graphics (SVG) from images or textual descriptions. The system treats vector graphics creation as a code generation problem, producing SVG code that can render detailed vector images. Its architecture combines computer vision techniques with language modeling capabilities so it can understand visual inputs and textual prompts simultaneously. The model converts raster images or text instructions into structured vector representations, enabling high-quality vectorization and design generation. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Umi-OCR

    Umi-OCR

    OCR software, free and offline

    Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. ...
    Downloads: 80 This Week
    Last Update:
    See Project
  • 8
    Diffusion Bee

    Diffusion Bee

    Diffusion Bee is the easiest way to run Stable Diffusion locally

    ...It wraps Stable Diffusion and its dependencies into a one-click installer so users don’t need to manually install Python, drivers, or machine-learning frameworks to generate images. The app runs entirely on the local machine so images are created offline and no user data is sent to external servers unless explicitly chosen, preserving privacy. Users can generate images from text prompts, perform image-to-image transformations, and apply additional features like inpainting, outpainting, and model-based upscaling directly within a clean graphical interface. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 9
    SAM 3D Body

    SAM 3D Body

    Code for running inference with the SAM 3D Body Model 3DB

    ...The model is trained to be robust in diverse, in-the-wild conditions, so it handles varied clothing, viewpoints, and backgrounds while maintaining strong accuracy across multiple human-pose benchmarks. The repository provides Python code to run inference, utilities to download checkpoints from Hugging Face, and demo scripts that turn images into 3D meshes and visualizations. There are Jupyter notebooks that walk you through setting up the model, running it on example images, and visualizing outputs in 3D, making it approachable even if you are not a 3D expert.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Host LLMs in Production With On-Demand GPUs Icon
    Host LLMs in Production With On-Demand GPUs

    NVIDIA L4 GPUs. 5-second cold starts. Scale to zero when idle.

    Deploy your model, get an endpoint, pay only for compute time. No GPU provisioning or infrastructure management required.
    Try Free
  • 10
    FLUX.1

    FLUX.1

    Official inference repo for FLUX.1 models

    ...This repo focuses on running the open-source model variants efficiently, providing scripts, model loading logic, and examples for local installations, and supports integration with Python toolchains like PyTorch and popular generative pipelines. Users can launch CLI tools to generate images, experiment with different FLUX variants, and extend the base code for research-oriented applications.
    Downloads: 96 This Week
    Last Update:
    See Project
  • 11
    FaceFusion

    FaceFusion

    Industry leading face manipulation platform

    FaceFusion is an open-source face swapping and facial enhancement toolkit designed for high-quality video and image manipulation workflows. The project enables users to replace faces in images or videos while maintaining temporal consistency and visual realism. It integrates modern deep learning models for face detection, alignment, and blending to produce smoother results than traditional approaches. FaceFusion is built with a modular pipeline that allows users to customize processing steps and optimize performance for different hardware environments. ...
    Downloads: 439 This Week
    Last Update:
    See Project
  • 12
    Watermark-Removal

    Watermark-Removal

    Machine learning image inpainting task that removes watermarks

    ...Through these techniques, the model learns to identify regions of the image affected by the watermark and generate realistic replacements for the missing visual information. The repository contains code for preprocessing images, training the model, and running inference on images to automatically remove watermark artifacts.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    pixelmatch

    pixelmatch

    The smallest, simplest JavaScript pixel-level image comparison library

    ...Unlike these libraries, pixelmatch is around 150 lines of code, has no dependencies, and works on raw typed arrays of image data, so it's blazing fast and can be used in any environment (Node or browsers). Compares two images, writes the output diff and returns the number of mismatched pixels.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Ideogram 4

    Ideogram 4

    Open image model at the forefront of design

    ...Ideogram 4 is especially useful for design-heavy outputs such as posters, ads, mockups, branded graphics, and images that include readable text. Its main value is combining open model access with professional-level control over image structure and visual direction.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 15
    Fooocus

    Fooocus

    Focus on prompting and generating

    Fooocus is an open-source image generation software that simplifies the process of creating images from text prompts. Built on Gradio and leveraging Stable Diffusion XL, Fooocus eliminates the need for manual parameter tweaking, allowing users to focus solely on crafting prompts. It offers a user-friendly interface with minimal setup, making advanced image synthesis accessible to a broader audience.
    Downloads: 339 This Week
    Last Update:
    See Project
  • 16
    Pot Desktop

    Pot Desktop

    A cross-platform software for text translation and recognition

    Pot-Desktop is a cross-platform productivity tool aimed at helping users quickly translate, perform OCR (optical character recognition), and synthesize speech for selected text or images — all with minimal friction. It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. The tool supports external plugin extensions, which means its functionality can be expanded far beyond the built-in options: you can add translation engines, OCR backends, TTS engines, vocabulary export (e.g. for language learning), and more. ...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 17
    HY-World 2.0

    HY-World 2.0

    A Multi-Modal World Model for Reconstructing, Generating, Simulation

    HY-World 2.0 is a multi-modal world model framework for reconstructing, generating, and simulating navigable 3D worlds from diverse inputs. It accepts text prompts, single-view images, multi-view images, and videos, and produces 3D world representations rather than limiting output to flat video generation. For text and single-image inputs, it generates high-fidelity 3D Gaussian Splatting scenes through a multi-stage pipeline that includes panorama generation, trajectory planning, world expansion, and world composition. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    ...Configurable label formats let you customize the visual interface to meet your specific labeling needs. Support for multiple data types including images, audio, text, HTML, time-series, and video.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 19
    HivisionIDPhoto

    HivisionIDPhoto

    HivisionIDPhotos: a lightweight and efficient AI ID photos tools

    ...The software analyzes portrait images, performs background removal, aligns the face according to ID photo standards, and produces images in various official size formats. It also allows the generation of layout sheets such as six-inch photo arrangements for printing multiple ID photos on a single page. The project focuses on building a practical pipeline for automated ID photo production using AI-based segmentation and image processing techniques.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports text-to-video and image-to-video generation tasks with flexible resolution options suitable for various GPU hardware configurations. ...
    Downloads: 59 This Week
    Last Update:
    See Project
  • 21
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    ...Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 22
    SimpleHTR

    SimpleHTR

    Handwritten Text Recognition (HTR) system implemented with TensorFlow

    SimpleHTR is an open-source implementation of a handwriting text recognition system based on deep learning techniques. The project focuses on converting images of handwritten text into machine-readable digital text using neural networks. The system uses a combination of convolutional neural networks and recurrent neural networks to extract visual features and model sequential character patterns in handwriting. It also employs connectionist temporal classification (CTC) to align predicted character sequences with input images without requiring character-level segmentation. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    PixelRAG

    PixelRAG

    The beginning of scalable pixel-native search

    PixelRAG is a visual retrieval-augmented generation system that searches documents by how they look, not only by the text they contain. It renders web pages, PDFs, and images into screenshot tiles, then performs retrieval over those visual representations. This approach preserves layout, tables, charts, diagrams, infographics, and other visual structure that traditional HTML or text parsing can miss. The project includes tools for rendering, chunking, embedding, indexing, and serving visual search indexes. ...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 24
    Hunyuan3D 2.0

    Hunyuan3D 2.0

    High-Resolution 3D Assets Generation with Large Scale Diffusion Models

    ...Includes a user-friendly production/studio tool (Hunyuan3D-Studio) to manipulate/animate meshes. Condition-aligned shape generation via the DiT model, so generated mesh is influenced by input images or prompts.
    Downloads: 38 This Week
    Last Update:
    See Project
  • 25
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short phrase or exemplars, scaling to a vastly larger set of categories than traditional closed-set models. ...
    Downloads: 37 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Auth0 Logo