Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
AI Models
Search Results

Search Results for "python image editor"

x

Sort By:

Relevance

Clear All Filters

OS

BSD 34
ChromeOS 34
Linux 34
More...
Mac 34
Windows 34

Category

Artificial Intelligence 34

License

OSI-Approved Open Source 21

Programming Language

Python 33

34 projects for "python image editor" with 2 filters applied:

AI Models BSD Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Cloud-based help desk software with ServoDesk
Full access to Enterprise features. No credit card required.

What if You Could Automate 90% of Your Repetitive Tasks in Under 30 Days? At ServoDesk, we help businesses like yours automate operations with AI, allowing you to cut service times in half and increase productivity by 25% - without hiring more staff.

Try ServoDesk for free
1

DeiT (Data-efficient Image Transformers)

Official DeiT repository

DeiT (Data-efficient Image Transformers) shows that Vision Transformers can be trained competitively on ImageNet-1k without external data by using strong training recipes and knowledge distillation. Its key idea is a specialized distillation strategy—including a learnable “distillation token”—that lets a transformer learn effectively from a CNN or transformer teacher on modest-scale datasets. The project provides compact ViT variants (Tiny/Small/Base) that achieve excellent...

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
2

HunyuanImage-3.0

A Powerful Native Multimodal Model for Image Generation

HunyuanImage-3.0 is a powerful, native multimodal text-to-image generation model released by Tencent’s Hunyuan team. It unifies multimodal understanding and generation in a single autoregressive framework, combining text and image modalities seamlessly rather than relying on separate image-only diffusion components. It uses a Mixture-of-Experts (MoE) architecture with many expert subnetworks to scale efficiently, deploying only a subset of experts per token, which allows large parameter...

1 Review

Downloads: 8 This Week

Last Update: 2025-10-31
See Project
3

SAM 3D Body

Code for running inference with the SAM 3D Body Model 3DB

SAM 3D Body is a promptable model for single-image full-body 3D human mesh recovery, designed to estimate detailed human pose and shape from just one RGB image. It reconstructs the full body, including feet and hands, using the Momentum Human Rig (MHR), a parametric mesh representation that decouples skeletal structure from surface shape for more accurate and interpretable results.

Downloads: 14 This Week

Last Update: 2 days ago
See Project
4

DeepSeek-OCR

Contexts Optical Compression

...It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. It supports local deployment, enabling organizations concerned about privacy or latency to run the pipeline on-premises rather than send sensitive documents to third-party cloud services. The codebase is written in Python with a focus on modularity: you can swap preprocessing, recognition, and post-processing components as needed for custom workflows.

Downloads: 43 This Week

Last Update: 2025-10-25
See Project
Skillfully - The future of skills based hiring
Realistic Workplace Simulations that Show Applicant Skills in Action

Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.

Learn More
5

SAM 3D Objects

Models for object and human mesh reconstruction

SAM 3D Objects is a foundation model that reconstructs full 3D geometry, texture, and spatial layout of objects and scenes from a single image. Given one RGB image and object masks (for example, from the Segment Anything family), it can generate a textured 3D mesh for each object, including pose and approximate scene layout. The model is specifically designed to be robust in real-world images with clutter, occlusions, small objects, and unusual viewpoints, where many earlier 3D-from-image...

Downloads: 3 This Week

Last Update: 2 days ago
See Project
6

DeepSeek VL2

Mixture-of-Experts Vision-Language Models for Advanced Multimodal

DeepSeek-VL2 is DeepSeek’s vision + language multimodal model—essentially the next-gen successor to their first vision-language models. It combines image and text inputs into a unified embedding / reasoning space so that you can query with text and image jointly (e.g. “What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to...

Downloads: 5 This Week

Last Update: 2025-10-03
See Project
7

CLIP

CLIP, Predict the most relevant text snippet given an image

CLIP (Contrastive Language-Image Pretraining) is a neural model that links images and text in a shared embedding space, allowing zero-shot image classification, similarity search, and multimodal alignment. It was trained on large sets of (image, caption) pairs using a contrastive objective: images and their matching text are pulled together in embedding space, while mismatches are pushed apart. Once trained, you can give it any text labels and ask it to pick which label best matches a given...

Downloads: 0 This Week

Last Update: 2025-10-02
See Project
8

HunyuanVideo-I2V

A Customizable Image-to-Video Model based on HunyuanVideo

HunyuanVideo-I2V is a customizable image-to-video generation framework from Tencent Hunyuan, built on their HunyuanVideo foundation. It extends video generation so that given a static reference image plus an optional prompt, it generates a video sequence that preserves the reference image’s identity (especially in the first frame) and allows stylized effects via LoRA adapters. The repository includes pretrained weights, inference and sampling scripts, training code for LoRA effects, and...

Downloads: 1 This Week

Last Update: 2025-09-23
See Project
9

DeepSeek VL

Towards Real-World Vision-Language Understanding

DeepSeek-VL is DeepSeek’s initial vision-language model that anchors their multimodal stack. It enables understanding and generation across visual and textual modalities—meaning it can process an image + a prompt, answer questions about images, caption, classify, or reason about visuals in context. The model is likely used internally as the visual encoder backbone for agent use cases, to ground perception in downstream tasks (e.g. answering questions about a screenshot). The repository...

Downloads: 1 This Week

Last Update: 2025-10-03
See Project
WinMan ERP Software
For companies of all sizes and enterprises in need of a solution to improve their operations

WinMan ERP is an all-encompassing solution designed to manage the operational, quality, commercial, and financial processes of manufacturers and distributors. It is particularly well-suited for companies embracing Lean strategies.

Learn More
10

Hunyuan3D-1

A Unified Framework for Text-to-3D and Image-to-3D Generation

Hunyuan3D-1 is an earlier version in the same 3D generation line (the unified framework for text-to-3D and image-to-3D tasks) by Tencent Hunyuan. It provides a framework combining shape generation and texture synthesis, enabling users to create 3D assets from images or text conditions. While less advanced than version 2.1, it laid the foundations for the later PBR, higher resolution, and open-source enhancements. (Note: less detailed public documentation was found for Hunyuan3D-1 compared to...

Downloads: 0 This Week

Last Update: 2025-10-17
See Project
11

HunyuanCustom

Multimodal-Driven Architecture for Customized Video Generation

HunyuanCustom is a multimodal video customization framework by Tencent Hunyuan, aimed at generating customized videos featuring particular subjects (people, characters) under flexible conditions, while maintaining subject/identity consistency. It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images. The architecture builds on HunyuanVideo, with added modules for...

Downloads: 0 This Week

Last Update: 2025-10-15
See Project
12

HunyuanWorld-Voyager

RGBD video generation model conditioned on camera input

HunyuanWorld-Voyager is a next-generation video diffusion framework developed by Tencent-Hunyuan for generating world-consistent 3D scene videos from a single input image. By leveraging user-defined camera paths, it enables immersive scene exploration and supports controllable video synthesis with high realism. The system jointly produces aligned RGB and depth video sequences, making it directly applicable to 3D reconstruction tasks. At its core, Voyager integrates a world-consistent video...

Downloads: 107 This Week

Last Update: 2025-10-22
See Project
13

HunyuanDiT

Diffusion Transformer with Fine-Grained Chinese Understanding

HunyuanDiT is a high-capability text-to-image diffusion transformer with bilingual (Chinese/English) understanding and multi-turn dialogue capability. It trains a diffusion model in latent space using a transformer backbone and integrates a Multimodal Large Language Model (MLLM) to refine captions and support conversational image generation. It supports adapters like ControlNet, IP-Adapter, LoRA, and can run under constrained VRAM via distillation versions. LoRA, ControlNet (pose, depth,...

Downloads: 1 This Week

Last Update: 2025-10-26
See Project
14

Style Aligned

Official code for Style Aligned Image Generation via Shared Attention

StyleAligned is a diffusion-model editing technique and codebase that preserves the visual “style” of an original image while applying new semantic edits driven by text. Instead of fully re-generating an image—and risking changes to lighting, texture, or rendering choices—the method aligns internal features across denoising steps so the target edit inherits the source style. This alignment acts like a constraint on the model’s evolution, steering composition, palette, and brushwork even as...

Downloads: 0 This Week

Last Update: 2025-10-10
See Project
15

DINOv3

Reference PyTorch implementation and models for DINOv3

DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while...

Downloads: 5 This Week

Last Update: 2025-11-03
See Project
16

HunyuanVideo-Avatar

Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model

HunyuanVideo-Avatar is a multimodal diffusion transformer (MM-DiT) model by Tencent Hunyuan for animating static avatar images into dynamic, emotion-controllable, and multi-character dialogue videos, conditioned on audio. It addresses challenges of motion realism, identity consistency, and emotional alignment. Innovations include a character image injection module, an Audio Emotion Module for transferring emotion cues, and a Face-Aware Audio Adapter to isolate audio effects on faces,...

Downloads: 0 This Week

Last Update: 2025-10-16
See Project
17

Phi-3-MLX

Phi-3.5 for Mac: Locally-run Vision and Language Models

Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.

Downloads: 0 This Week

Last Update: 2025-03-13
See Project
18

SAM 3

Code for running inference and finetuning with SAM 3 model

SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an...

Downloads: 13 This Week

Last Update: 2 days ago
See Project
19

MobileCLIP

Implementation of "MobileCLIP" CVPR 2024

MobileCLIP is a family of efficient image-text embedding models designed for real-time, on-device retrieval and zero-shot classification. The repo provides training, inference, and evaluation code for MobileCLIP models trained on DataCompDR, and for newer MobileCLIP2 models trained on DFNDR. It includes an iOS demo app and Core ML artifacts to showcase practical, offline photo search and classification on iPhone-class hardware. Project notes highlight latency/accuracy trade-offs, with...

Downloads: 0 This Week

Last Update: 2025-10-09
See Project
20

Map-Anything

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Map-Anything is a universal, feed-forward transformer for metric 3D reconstruction that predicts a scene’s geometry and camera parameters directly from visual inputs. Instead of stitching together many task-specific models, it uses a single architecture that supports a wide range of 3D tasks—multi-image structure-from-motion, multi-view stereo, monocular metric depth, registration, depth completion, and more. The model flexibly accepts different input combinations (images, intrinsics, poses,...

Downloads: 1 This Week

Last Update: 2025-10-24
See Project
21

DreamCraft3D

Official implementation of DreamCraft3D

DreamCraft3D is DeepSeek’s generative 3D modeling framework / model family that likely extends their earlier 3D efforts (e.g. Shap-E or Point-E style models) with more capability, control, or expression. The name suggests a “dream crafting” metaphor—users probably supply textual or image prompts and generate 3D assets (point clouds, meshes, scenes). The repository includes model code, inference scripts, sample prompts, and possibly dataset preparation pipelines. It may integrate rendering or...

Downloads: 0 This Week

Last Update: 2025-10-03
See Project
22

Janus

Unified Multimodal Understanding and Generation Models

Janus is a sophisticated open-source project from DeepSeek AI that aims to unify both visual understanding and image generation in a single model architecture. Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations. The design tackles long-standing...

Downloads: 0 This Week

Last Update: 2025-10-20
See Project
23

Depth Pro

Sharp Monocular Metric Depth in Less Than a Second

Depth Pro is a foundation model for zero-shot metric monocular depth estimation, producing sharp, high-frequency depth maps with absolute scale from a single image. Unlike many prior approaches, it does not require camera intrinsics or extra metadata, yet still outputs metric depth suitable for downstream 3D tasks. Apple highlights both accuracy and speed: the model can synthesize a ~2.25-megapixel depth map in around 0.3 seconds on a standard GPU, enabling near real-time applications. The...

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
24

DINOv2

PyTorch code and models for the DINOv2 self-supervised learning

DINOv2 is a self-supervised vision learning framework that produces strong, general-purpose image representations without using human labels. It builds on the DINO idea of student–teacher distillation and adapts it to modern Vision Transformer backbones with a carefully tuned recipe for data augmentation, optimization, and multi-crop training. The core promise is that a single pretrained backbone can transfer well to many downstream tasks—from linear probing on classification to retrieval,...

Downloads: 0 This Week

Last Update: 2025-10-06
See Project
25

Large Concept Model

Language modeling in a sentence representation space

Large Concept Model is a research codebase centered on concept-centric representation learning at scale, aiming to capture shared structure across many categories and modalities. It organizes training around concepts (rather than just raw labels), encouraging models to understand attributes, relations, and compositional structure that transfer across tasks. The repository provides training loops, data tooling, and evaluation routines to learn and probe these concept embeddings, typically...

Downloads: 0 This Week

Last Update: 2025-10-07
See Project

Previous
You're on page 1
2
Next

Related Searches

ocr

deepseek

video ai

phi

windows boot repair

tesseract-ocr-w64-setup-v5.3.3.20231005.exe，64

tesseract-ocr-setup-3.02.02.exe

open ocr

offline artificial intelligence\

ocr from pdf

Related Categories

Artificial Intelligence

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2025 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: