Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
OCR Software
Search Results

Search Results for "java open source"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 108
Windows 105
Mac 64
More...
BSD 49
ChromeOS 26
Mobile Operating Systems 7
Desktop Operating Systems 5
Server Operating Systems 2

Category

Artificial Intelligence 147
Multimedia 53
Business 25
Software Development 18
Scientific/Engineering 17
Text Editors 12
System 9
Education 8
Desktop Environment 7
Communications 6
Formats and Protocols 5
Security 5
Mobile 4
Printing 4
Games 3
Internet 3
Database 1
Religion and Philosophy 1
Social sciences 1

License

OSI-Approved Open Source 143
GNU Free Documentation License 1

Translations

Programming Language

C++ 46
Python 28
Java 26
C 17
More...
JavaScript 11
PHP 9
C# 7
Delphi/Kylix 3
Perl 3
ASP.NET 2
OCaml (Objective Caml) 2
Visual Basic 2
Visual Basic .NET 2
Ada 1
Assembly 1
BASIC 1
Julia 1
Pascal 1
R 1
Ruby 1
Swift 1
Unix Shell 1

Status

Production/Stable 39
Beta 27
Pre-Alpha 19
Alpha 14
More...
Planning 10
Mature 8

Showing 147 open source projects for "java open source"

View related business solutions

OCR Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
The Most Powerful Software Platform for EHSQ and ESG Management
Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.

Learn More
1

Tesseract OCR

Open Source OCR Engine

Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns.

Downloads: 2,519 This Week

Last Update: 2025-12-26
See Project
2

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Downloads: 113 This Week

Last Update: 2025-12-24
See Project
3

Umi-OCR

OCR software, free and offline

...Users can interact with Umi-OCR through a graphical interface, command-line options, or HTTP interfaces, making it adaptable to both casual desktop usage and programmatic automation. Because the project is open source, developers can inspect, modify, and extend its capabilities, and plugins allow for different recognition engines or enhanced features.

Downloads: 38 This Week

Last Update: 6 days ago
See Project
4

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle

PaddleOCR offers exceptional, multilingual, and practical Optical Character Recognition (OCR) tools that can help users train better models and apply them into practice. Inspired by PaddlePaddle, PaddleOCR is an ultra lightweight OCR system, with multilingual recognition, digit recognition, vertical text recognition, as well as long text recognition. It features a PPOCR series of high-quality pre-trained models, which includes: ultra lightweight ppocr_mobile series models, general...

Downloads: 52 This Week

Last Update: 2 days ago
See Project
Desktop and Mobile Device Management Software
It's a modern take on desktop management that can be scaled as per organizational needs.

Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.

Learn More
5

Video-subtitle-extractor

A GUI tool for extracting hard-coded subtitle (hardsub) from videos

Video hard subtitle extraction, generate srt file. There is no need to apply for a third-party API, and text recognition can be implemented locally. A deep learning-based video subtitle extraction framework, including subtitle region detection and subtitle content extraction. A GUI tool for extracting hard-coded subtitles (hardsub) from videos and generating srt files. Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu...

1 Review

Downloads: 60 This Week

Last Update: 2025-05-13
See Project
6

EasyOCR

Ready-to-use OCR with 80+ supported languages

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc. EasyOCR is a python module for extracting text from image. It is a general OCR that can read both natural scene text and dense text in document. We are currently supporting 80+ languages and expanding. Second-generation models: multiple times smaller size, multiple times faster inference, additional characters and comparable accuracy to the first...

Downloads: 39 This Week

Last Update: 2024-09-24
See Project
7

Tesseract.js

A pure Javascript Multilingual OCR

Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. Tesseract.js' library supports more than 100 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS. Tesseract.js is a javascript library that gets words in almost any spoken language out of images. The main Tesseract.js functions (ex. recognize, detect) take an image...

Downloads: 18 This Week

Last Update: 2025-12-15
See Project
8

Papermerge

Open Source Document Management System for Digital Archives

...Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and open-source software which means that transparency is the core value of our software development. Source code can be reviewed and improved by anyone from anywhere. Papermerge supports multiple users. Each user can be assigned different permissions to perform only a specific kind of action e.g. view only documents from a specific folder.

Downloads: 10 This Week

Last Update: 2025-07-24
See Project
9

DeepSeek-OCR

Contexts Optical Compression

DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. ...

Downloads: 6 This Week

Last Update: 2025-10-25
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Learn More
10

PaddleOCR-json

OCR offline image text recognition command line windows program

PaddleOCR-json is an OCR engine based on the PaddleOCR project that provides a command-line interface and tools for extracting text from images and exporting results in structured JSON format. It wraps the PaddleOCR models, which are capable of detecting and recognizing text in a wide variety of languages and layouts, into a self-contained executable that can be run locally without needing a deep learning environment configured manually. This makes it practical for developers or system...

Downloads: 4 This Week

Last Update: 6 days ago
See Project
11

Paper2GUI

Convert AI papers to GUI

Convert AI papers to GUI，Make it easy and convenient for everyone to use artificial intelligence technology。让每个人都简单方便的使用前沿人工智能技术 Paper2GUI: An AI desktop APP toolbox for ordinary people. It can be used immediately without installation. It already supports 40+ AI models, covering AI painting, speech synthesis, video frame complementing, video super-resolution, object detection, and image stylization. , OCR recognition and other fields. Support Windows, Mac, Linux systems. Paper2GUI:...

Downloads: 6 This Week

Last Update: 2024-09-20
See Project
12

DocTR

Library for OCR-related tasks powered by Deep Learning

DocTR provides an easy and powerful way to extract valuable information from your documents. Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor. State-of-the-art performances on public document...

Downloads: 2 This Week

Last Update: 2025-07-09
See Project
13

HunyuanOCR

OCR expert VLM powered by Hunyuan's native multimodal architecture

HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools.

Downloads: 3 This Week

Last Update: 2026-01-13
See Project
14

DeepDetect

Deep Learning API and Server in C++14 support for Caffe, PyTorch

...Neural network templates for the most effective architectures for GPU, CPU, and Embedded devices. Training in a few hours and with small data thanks to 25+ pre-trained models. Full Open Source, with an ecosystem of tools (API clients, video, annotation, ...) Fast Server written in pure C++, a single codebase for Cloud, Desktop & Embedded.

Downloads: 0 This Week

Last Update: 2025-07-19
See Project
15

VietOCR

Provides optical character recognition (OCR) solutions for Vietnamese language.

24 Reviews

Downloads: 313 This Week

Last Update: 5 days ago
See Project
16

chessPDFBrowser

Chess application whichs allows working with chess PDF books and PGNs.

Chess application which allows working with PDFs and PGNs. You can work with the chess games of the PDF and edit their tree of variants. Graphical environment. Standard PGN TAGs. PGN comments. Ocr like (Fen string detection from chess board position images). Connection to Uci chess engines (like stockfish). Position analysis, full game analysis. You can now play games against uci engines. pdf2pgn command line command included. Detailed documentation. Multilanguage...

1 Review

Downloads: 29 This Week

Last Update: 2025-12-26
See Project
17

NAPS2 - Not Another PDF Scanner

Scan documents to PDF and other file types, as simply as possible.

Visit NAPS2's home page at www.naps2.com. NAPS2 is a document scanning application with a focus on simplicity and ease of use. Scan your documents from WIA- and TWAIN-compatible scanners, organize the pages as you like, and save them as PDF, TIFF, JPEG, PNG, and other file formats. Available on Windows, Mac, and Linux. NAPS2 is currently available in over 40 different languages. Want to see NAPS2 in your preferred language? Help translate! See the wiki for more details.

149 Reviews

Downloads: 819 This Week

Last Update: 2026-01-10
See Project
18

MyBox

Easy Tools of PDF, Image, File, Network, Data, and Medias

javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.

Downloads: 9 This Week

Last Update: 2025-12-27
See Project
19

gscan2pdf

A GUI to ease the process of producing a multipage PDF from a scan. gscan2pdf should work on almost any Linux/BSD machine.

22 Reviews

Downloads: 264 This Week

Last Update: 2025-11-05
See Project
20

Super PDF Editor (a Batch PDF Processor)

Create, Edit, Delete, Organize , Convert, Export, Secure & Sign PDF.

Super PDF Editor - Powerful, superfast, lightweight PDF processor. All-in-one PDF solution, PDF editing with 80+ tools and functions. The easy-to-use software is complete with editing tools for modifying PDF files your way. Most comprehensive, powerful, process-based and lightning-fast batch processor software. OCR PDF. PDF Imposition, Reverse Pages, Resize Page, Scale Page, Booklet, N-up Pages, Merge, Split by page, Extract Page, Rotate Page. Replace Page, Insert Page, Delete Page....

6 Reviews

Downloads: 32 This Week

Last Update: 2 days ago
See Project
21

DocWire SDK

Award-winning modern data processing SDK in C++20

DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...

Downloads: 4 This Week

Last Update: 2 days ago
See Project
22

bitfarm-Archiv Document Management - DMS

bitfarm-Archiv is a powerful Document Management (DMS), Enterprise Content Management (ECM) and Knowledge Management System (KMS) with Workflow Components. Help us! As we live in the internet age, the best thing, you can help, is to write a short statement about your scenario and your use of the DMS, along with your experiences and put it on your own website or in a blog or forum. It would help us best, if you can also add a hyperlink to our site http://www.bitfarm-archiv.com. By this...

10 Reviews

Downloads: 8 This Week

Last Update: 2025-11-25
See Project
23

AvantFAX

Multiuser HylaFAX PHP/MySQL Web interface for viewing faxes online, downloading & emailing in PDF format, and categorizing & archiving all sent and received faxes.

10 Reviews

Downloads: 4 This Week

Last Update: 2025-04-10
See Project
24

OpenKYC - FaceOnLive Community Project

FaceOnLive Open KYC: Streamlining Identity Verification with AI

...By seamlessly integrating these powerful tools, we empower businesses across industries to streamline their KYC processes with unparalleled accuracy and efficiency. At the heart of our initiative lies an open-source UI flow, meticulously designed to provide users with an intuitive and seamless experience throughout the identity verification journey. From effortlessly capturing ID documents to conducting robust selfie liveness checks, our platform offers a user-friendly interface that prioritizes both security and convenience.

149 Reviews

Downloads: 0 This Week

Last Update: 2024-04-02
See Project
25

Nougat

Implementation of Nougat Neural Optical Understanding

Nougat is a multi-modal generative modeling framework that bridges vision and text modalities with structured generation control (e.g. layout, scene composition) rather than treating images as flat contexts. It combines object-centric modules with transformer-based reasoning to propose, refine, and render scenes in a generative pipeline. The architecture allows you to specify or prompt a layout (which objects should be where) and then the model fills in appearance, context, lighting, and...

Downloads: 0 This Week

Last Update: 2025-10-06
See Project

Previous
You're on page 1
2
3
4
5
6
Next

Related Searches

tesseract-ocr-w64-setup-v5.x.x.exe

tesseract

tesseract-ocr

twain scanner

tesseract ocr

ocr

pdf editor

tesseract-ocr-w64-setup

tesseract-ocr-w64-setup-v5.3.1.20230401.exe

tesseract-ocr-setup-3.02.02.exe

Related Categories

Artificial Intelligence

Multimedia

Business

Software Development

Scientific/Engineering

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: