Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence Software
Search Results

Search Results for "java ocr extraction text"

x

Sort By:

Relevance

Clear All Filters

OS

ChromeOS 18
BSD 18
Linux 18
More...
Mac 18
Windows 18

Category

Artificial Intelligence 18
Scientific/Engineering 7
Education 2
Multimedia 2
Text Editors 2
Business 1
Communications 1
Games 1

License

OSI-Approved Open Source 16
Creative Commons Attribution License 1
Other License 1

Translations

Programming Language

Java 13
Python 2
C++ 1
C# 1
More...
JavaScript 1
Perl 1
Rust 1

Status

Beta 6
Alpha 3
Production/Stable 2
Mature 1
More...
Inactive 1

18 projects for "java ocr extraction text" with 2 filters applied:

Artificial Intelligence ChromeOS Clear Filters & Widen Search

Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
1

GLM-OCR

Accurate × Fast × Comprehensive

GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. ...

Downloads: 1 This Week

Last Update: 2026-04-08
See Project
2

Scribe.js

JavaScript OCR and text extraction for images and PDFs

Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. ...

Downloads: 1 This Week

Last Update: 2026-05-06
See Project
3

Extractous

Fast and efficient unstructured data extraction

Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
4

chessPDFBrowser

Chess application whichs allows working with chess PDF books and PGNs.

Chess application which allows working with PDFs and PGNs. You can work with the chess games of the PDF and edit their tree of variants. Graphical environment. Standard PGN TAGs. PGN comments. Ocr like (Fen string detection from chess board position images). Connection to Uci chess engines (like stockfish). Position analysis, full game analysis. You can now play games against uci engines. pdf2pgn command line command included. Detailed documentation. Multilanguage...

1 Review

Downloads: 32 This Week

Last Update: 2026-04-04
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

VietOCR

Provides optical character recognition (OCR) solutions for Vietnamese language.

24 Reviews

Downloads: 193 This Week

Last Update: 2026-01-17
See Project
6

OpenKM Document Management - DMS

Document Management System and Content Management System

OpenKM Community Edition is a free Document Management System (DMS) that helps businesses control the production, storage, management and distribution of electronic documents, boosting effectiveness and productivity. It integrates document management, collaboration and advanced search into one easy-to-use solution, including administration tools for user roles, access control, security levels, activity logs and automation setup. With OpenKM Community Edition you can: Collect information...

32 Reviews

Downloads: 333 This Week

Last Update: 2026-05-14
See Project
7

Convolutional Recurrent Neural Network

Convolutional Recurrent Neural Network (CRNN) for image-based sequence

Convolutional Recurrent Neural Network provides an implementation of the Convolutional Recurrent Neural Network (CRNN) architecture, a deep learning model designed for image-based sequence recognition tasks such as optical character recognition and scene text recognition. The architecture combines convolutional neural networks for extracting visual features from images with recurrent neural networks that model sequential dependencies in the extracted features. This hybrid approach allows the...

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
8

FALCON - Text Search Java Project

JSON based text search Java Project

----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language...

Downloads: 0 This Week

Last Update: 2014-04-18
See Project
9

TML - Text Mining Library for LSA & CMM

TML is a Java Library for LSA and extracting Concept Maps from text

TML has moved to http://www.villalon.cl/tml.html and the code to https://github.com/villalon/tml

3 Reviews

Downloads: 0 This Week

Last Update: 2013-08-05
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

RapidMiner Information Extraction Plugin

The Information Extraction Plugin allows the use of information extraction techniques within RapidMiner. It can be seen as an interface between natural language and IE- or datamining-methods, by extracting interesting information out of documents.

Downloads: 0 This Week

Last Update: 2015-08-07
See Project
11

text-analysis

This project aims to implement in java the following text mining techniques: Text Language Detection, Keywords and keyphrases extraction, Text Classification, Text Clustering, Single or multiple documents Summarization, Plagiarism Detection.

Downloads: 0 This Week

Last Update: 2014-05-20
See Project
12

TextMarker

TextMarker is now developed and hosted at Apache UIMA (http://uima.apache.org/textmarker.html). TextMarker is a UIMA-based tool for information extraction and more. The full featured editor of the rule language and the build process of UIMA descriptors are complemented with components for visualization, explanation, testing and rule learning.

1 Review

Downloads: 3 This Week

Last Update: 2013-04-29
See Project
13

SEMANTIXS

SEMANTIXS is a semantic information extraction system that can extract, represent and visualize domain-specific information from free-text in the form of complex (and simple) relationships. Refer - http://www.cs.iastate.edu/~semantix/ for more info.

Downloads: 0 This Week

Last Update: 2013-05-02
See Project
14

Trainable Relation Extraction framework

T-Rex (Trainable Relation Extraction) is a highly configurable machine learning-based Information Extraction from Text framework, which includes tools for document classification, entity extraction and relation extraction.

Downloads: 0 This Week

Last Update: 2013-05-02
See Project
15

MutationFinder

MutationFinder is a biomedical natural language processing (NLP) system for extracting mentions of point mutations from free text. MutationFinder achieves high performance (99% precision, 81% recall on blind test data) as an information extraction system

Downloads: 0 This Week

Last Update: 2013-03-22
See Project
16

FacialDAS

This project aims to distribute a facial animation system with speech, developed to brazilian portuguese case. This system is composed by many modules: movement extraction, facial animation and speech, through a text-to-speech system.

Downloads: 0 This Week

Last Update: 2015-09-22
See Project
17

translategemma-4b-it

Lightweight multimodal translation model for 55 languages

translategemma-4b-it is a lightweight, state-of-the-art open translation model from Google, built on the Gemma 3 family and optimized for high-quality multilingual translation across 55 languages. It supports both text-to-text translation and image-to-text extraction with translation, enabling workflows such as OCR-style translation of signs, documents, and screenshots. With a compact ~5B parameter footprint and BF16 support, the model is designed to run efficiently on laptops, desktops, and private cloud infrastructure, making advanced translation accessible without heavy hardware requirements. ...

Downloads: 0 This Week

Last Update: 2026-01-16
See Project
18

GraphSpider/MPL

GraphSpider is a pattern matcher which searches parsed text in phrase-structure tree or dependency graph format for syntactic structures matching a set of patterns in MPL, a regexp-like pattern language. Applications: information extraction, text mining.

Downloads: 0 This Week

Last Update: 2013-04-19
See Project

Previous
You're on page 1
Next

Related Searches

openkm

dms

pgn-chessbook

document management

openkm-6.3.12-community.zip

offline document management

ocr

chess engines

chess

jtessboxeditor-2.7.0.zip

Related Categories

Artificial Intelligence

Scientific/Engineering

Education

Multimedia

Text Editors

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise