Showing 321 open source projects for "python image editor"

View related business solutions
  • Cloud-based help desk software with ServoDesk Icon
    Cloud-based help desk software with ServoDesk

    Full access to Enterprise features. No credit card required.

    What if You Could Automate 90% of Your Repetitive Tasks in Under 30 Days? At ServoDesk, we help businesses like yours automate operations with AI, allowing you to cut service times in half and increase productivity by 25% - without hiring more staff.
    Try ServoDesk for free
  • Run applications fast and securely in a fully managed environment Icon
    Run applications fast and securely in a fully managed environment

    Cloud Run is a fully-managed compute platform that lets you run your code in a container directly on top of scalable infrastructure.

    Run frontend and backend services, batch jobs, deploy websites and applications, and queue processing workloads without the need to manage infrastructure.
    Try for free
  • 1
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports...
    Downloads: 57 This Week
    Last Update:
    See Project
  • 2
    HunyuanWorld 1.0

    HunyuanWorld 1.0

    Generating Immersive, Explorable, and Interactive 3D Worlds

    HunyuanWorld-1.0 is an open-source, simulation-capable 3D world generation model developed by Tencent Hunyuan that creates immersive, explorable, and interactive 3D environments from text or image inputs. It combines the strengths of video-based diversity and 3D-based geometric consistency through a novel framework using panoramic world proxies and semantically layered 3D mesh representations. This approach enables 360° immersive experiences, seamless mesh export for graphics pipelines, and...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 3
    Screenshot to Code

    Screenshot to Code

    A neural network that transforms a design mock-up into static websites

    Screenshot-to-code is a tool or prototype that attempts to convert UI screenshots (e.g., of mobile or web UIs) into code representations, likely generating layouts, HTML, CSS, or markup from image inputs. It is part of a research/proof-of-concept domain in UI automation and image-to-UI code generation. Mapping visual design to code constructs. Code/UI layout (HTML, CSS, or markup). Examples/demo scripts showing “image UI code”.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. Qwen-VL supports multilingual inputs...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Create and run cloud-based virtual machines. Icon
    Create and run cloud-based virtual machines.

    Secure and customizable compute service that lets you create and run virtual machines.

    Computing infrastructure in predefined or custom machine sizes to accelerate your cloud transformation. General purpose (E2, N1, N2, N2D) machines provide a good balance of price and performance. Compute optimized (C2) machines offer high-end vCPU performance for compute-intensive workloads. Memory optimized (M2) machines offer the highest memory and are great for in-memory databases. Accelerator optimized (A2) machines are based on the A100 GPU, for very demanding applications.
    Try for free
  • 5
    MGIE

    MGIE

    Guiding Instruction-based Image Editing via Multimodal Large Language

    MGIE—Guiding Instruction-based Image Editing—demonstrates how a multimodal LLM can parse natural-language editing instructions and then drive image transformations accordingly. The project focuses on making edits explainable and controllable: the model interprets text guidance, reasons over image content, and outputs edits aligned with user intent. It’s positioned as an ICLR 2024 Spotlight work, with code and references that show how to connect language planning to concrete image operations....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    CLIP

    CLIP

    CLIP, Predict the most relevant text snippet given an image

    CLIP (Contrastive Language-Image Pretraining) is a neural model that links images and text in a shared embedding space, allowing zero-shot image classification, similarity search, and multimodal alignment. It was trained on large sets of (image, caption) pairs using a contrastive objective: images and their matching text are pulled together in embedding space, while mismatches are pushed apart. Once trained, you can give it any text labels and ask it to pick which label best matches a given...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Kornia

    Kornia

    Open Source Differentiable Computer Vision Library

    Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer vision problems. At its core, the package uses PyTorch as its main backend both for efficiency and to take advantage of the reverse-mode auto-differentiation to define and compute the gradient of complex functions. Inspired by existing packages, this library is composed by a subset of packages containing operators that can be inserted within...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    CogVideo

    CogVideo

    text and image to video generation: CogVideoX (2024) and CogVideo

    CogVideo is an open source text-/image-/video-to-video generation project that hosts the CogVideoX family of diffusion-transformer models and end-to-end tooling. The repo includes SAT and Diffusers implementations, turnkey demos, and fine-tuning pipelines (including LoRA) designed to run across a wide range of NVIDIA GPUs, from desktop cards (e.g., RTX 3060) to data-center hardware (A100/H100). Current releases cover CogVideoX-2B, CogVideoX-5B, and the upgraded CogVideoX1.5-5B variants, plus...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    CogView4

    CogView4

    CogView4, CogView3-Plus and CogView3(ECCV 2024)

    CogView4 is the latest generation in the CogView series of vision-language foundation models, developed as a bilingual (Chinese and English) open-source system for high-quality image understanding and generation. Built on top of the GLM framework, it supports multimodal tasks including text-to-image synthesis, image captioning, and visual reasoning. Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Leverage AI to Automate Medical Coding Icon
    Leverage AI to Automate Medical Coding

    Medical Coding Solution

    As a healthcare provider, you should be paid promptly for the services you provide to patients. Slow, inefficient, and error-prone manual coding keeps you from the financial peace you deserve. XpertDox’s autonomous coding solution accelerates the revenue cycle so you can focus on providing great healthcare.
    Learn More
  • 10
    HunyuanVideo-I2V

    HunyuanVideo-I2V

    A Customizable Image-to-Video Model based on HunyuanVideo

    HunyuanVideo-I2V is a customizable image-to-video generation framework from Tencent Hunyuan, built on their HunyuanVideo foundation. It extends video generation so that given a static reference image plus an optional prompt, it generates a video sequence that preserves the reference image’s identity (especially in the first frame) and allows stylized effects via LoRA adapters. The repository includes pretrained weights, inference and sampling scripts, training code for LoRA effects, and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Generative AI for Beginners (Version 3)

    Generative AI for Beginners (Version 3)

    21 Lessons, Get Started Building with Generative AI

    Generative AI for Beginners is a 21-lesson course by Microsoft Cloud Advocates that teaches the fundamentals of building generative AI applications in a practical, project-oriented way. Lessons are split into “Learn” modules for core concepts and “Build” modules with hands-on code in Python and TypeScript, so you can jump in at any point that matches your goals. The course covers everything from model selection, prompt engineering, and chat/text/image app patterns to secure development practices and UX for AI. It also walks through modern application techniques such as function calling, RAG with vector databases, working with open source models, agents, fine-tuning, and using SLMs. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 12
    DeepSeek VL

    DeepSeek VL

    Towards Real-World Vision-Language Understanding

    DeepSeek-VL is DeepSeek’s initial vision-language model that anchors their multimodal stack. It enables understanding and generation across visual and textual modalities—meaning it can process an image + a prompt, answer questions about images, caption, classify, or reason about visuals in context. The model is likely used internally as the visual encoder backbone for agent use cases, to ground perception in downstream tasks (e.g. answering questions about a screenshot). The repository...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Hunyuan3D-1

    Hunyuan3D-1

    A Unified Framework for Text-to-3D and Image-to-3D Generation

    Hunyuan3D-1 is an earlier version in the same 3D generation line (the unified framework for text-to-3D and image-to-3D tasks) by Tencent Hunyuan. It provides a framework combining shape generation and texture synthesis, enabling users to create 3D assets from images or text conditions. While less advanced than version 2.1, it laid the foundations for the later PBR, higher resolution, and open-source enhancements. (Note: less detailed public documentation was found for Hunyuan3D-1 compared to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    HunyuanCustom

    HunyuanCustom

    Multimodal-Driven Architecture for Customized Video Generation

    HunyuanCustom is a multimodal video customization framework by Tencent Hunyuan, aimed at generating customized videos featuring particular subjects (people, characters) under flexible conditions, while maintaining subject/identity consistency. It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images. The architecture builds on HunyuanVideo, with added modules for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    HunyuanWorld-Voyager

    HunyuanWorld-Voyager

    RGBD video generation model conditioned on camera input

    HunyuanWorld-Voyager is a next-generation video diffusion framework developed by Tencent-Hunyuan for generating world-consistent 3D scene videos from a single input image. By leveraging user-defined camera paths, it enables immersive scene exploration and supports controllable video synthesis with high realism. The system jointly produces aligned RGB and depth video sequences, making it directly applicable to 3D reconstruction tasks. At its core, Voyager integrates a world-consistent video...
    Downloads: 107 This Week
    Last Update:
    See Project
  • 16
    fastdup

    fastdup

    An unsupervised and free tool for image and video dataset analysis

    fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Perception Models

    Perception Models

    State-of-the-art Image & Video CLIP, Multimodal Large Language Models

    Perception Models is a state-of-the-art framework developed by Facebook Research for advanced image and video perception tasks. It introduces two primary components: the Perception Encoder (PE) for visual feature extraction and the Perception Language Model (PLM) for multimodal decoding and reasoning. The PE module is a family of vision encoders designed to excel in image and video understanding, surpassing models like SigLIP2, InternVideo2, and DINOv2 across multiple benchmarks. Meanwhile,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Albumentations

    Albumentations

    Fast image augmentation library and an easy-to-use wrapper

    Albumentations is a computer vision tool that boosts the performance of deep convolutional neural networks. Albumentations is a Python library for fast and flexible image augmentations. Albumentations efficiently implements a rich variety of image transform operations that are optimized for performance, and does so while providing a concise, yet powerful image augmentation interface for different computer vision tasks, including object classification, segmentation, and detection. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    x-unet

    x-unet

    Implementation of a U-net complete with efficient attention

    Implementation of a U-net complete with efficient attention as well as the latest research findings. For 3d (video or CT / MRI scans).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    MMDetection

    MMDetection

    An open source object detection toolbox based on PyTorch

    MMDetection is an open source object detection toolbox that's part of the OpenMMLab project developed by Multimedia Laboratory, CUHK. It stems from the codebase developed by the MMDet team, who won the COCO Detection Challenge in 2018. Since that win this toolbox has continuously been developed and improved. MMDetection detects various objects within a given image with high efficiency. Its training speed is comparable or even faster than those of other codebases like Detectron2 and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    HunyuanDiT

    HunyuanDiT

    Diffusion Transformer with Fine-Grained Chinese Understanding

    HunyuanDiT is a high-capability text-to-image diffusion transformer with bilingual (Chinese/English) understanding and multi-turn dialogue capability. It trains a diffusion model in latent space using a transformer backbone and integrates a Multimodal Large Language Model (MLLM) to refine captions and support conversational image generation. It supports adapters like ControlNet, IP-Adapter, LoRA, and can run under constrained VRAM via distillation versions. LoRA, ControlNet (pose, depth,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    CogVLM2

    CogVLM2

    GPT4V-level open-source multi-modal model based on Llama3-8B

    CogVLM2 is the second generation of the CogVLM vision-language model series, developed by ZhipuAI and released in 2024. Built on Meta-Llama-3-8B-Instruct, CogVLM2 significantly improves over its predecessor by providing stronger performance across multimodal benchmarks such as TextVQA, DocVQA, and ChartQA, while introducing extended context length support of up to 8K tokens and high-resolution image input up to 1344×1344. The series includes models for both image understanding and video...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Diffusers

    Diffusers

    State-of-the-art diffusion models for image and audio generation

    Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, Diffusers is a modular toolbox that supports both. Our library is designed with a focus on usability over performance, simple over easy, and customizability over abstractions. State-of-the-art diffusion pipelines that can be run in inference with just a...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    Watermark Anything

    Watermark Anything

    Official implementation of Watermark Anything with Localized Messages

    Watermark Anything (WAM) is an advanced deep learning framework for embedding and detecting localized watermarks in digital images. Developed by Facebook Research, it provides a robust, flexible system that allows users to insert one or multiple watermarks within selected image regions while maintaining visual quality and recoverability. Unlike traditional watermarking methods that rely on uniform embedding, WAM supports spatially localized watermarks, enabling targeted protection of...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    MMDeploy

    MMDeploy

    OpenMMLab Model Deployment Framework

    MMDeploy is an open-source deep learning model deployment toolset. It is a part of the OpenMMLab project. Models can be exported and run in several backends, and more will be compatible. All kinds of modules in the SDK can be extended, such as Transform for image processing, Net for Neural Network inference, Module for postprocessing and so on. Install and build your target backend. ONNX Runtime is a cross-platform inference and training accelerator compatible with many popular ML/DNN...
    Downloads: 2 This Week
    Last Update:
    See Project