open pdf free download

Showing 32 open source projects for "open pdf"

View related business solutions

OCR Clear Filters & Widen Search

$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
1

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Downloads: 126 This Week

Last Update: 5 days ago
See Project
2

MinerU

A high-quality tool for convert PDF to Markdown and JSON

MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.

Downloads: 15 This Week

Last Update: 6 days ago
See Project
3

Tesseract OCR

Open Source OCR Engine

...Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.

5 Reviews

Downloads: 15,634 This Week

Last Update: 2025-12-26
See Project
4

Scribe.js

JavaScript OCR and text extraction for images and PDFs

Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. In...

Downloads: 4 This Week

Last Update: 2026-05-27
See Project
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
5

Umi-OCR

OCR software, free and offline

...The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. Users can interact with Umi-OCR through a graphical interface, command-line options, or HTTP interfaces, making it adaptable to both casual desktop usage and programmatic automation. Because the project is open source, developers can inspect, modify, and extend its capabilities, and plugins allow for different recognition engines or enhanced features.

Downloads: 43 This Week

Last Update: 2026-01-15
See Project
6

Papermerge

Open Source Document Management System for Digital Archives

Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats.

Downloads: 13 This Week

Last Update: 2025-07-24
See Project
7

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...

Downloads: 10 This Week

Last Update: 2026-02-03
See Project
8

Super PDF Editor (a Batch PDF Processor)

Create, Edit, Delete, Organize , Convert, Export, Secure & Sign PDF.

Super PDF Editor - Powerful, superfast, lightweight PDF processor. All-in-one PDF solution, PDF editing with 80+ tools and functions. The easy-to-use software is complete with editing tools for modifying PDF files your way. Most comprehensive, powerful, process-based and lightning-fast batch processor software. OCR PDF. PDF Imposition, Reverse Pages, Resize Page, Scale Page, Booklet, N-up Pages, Merge, Split by page, Extract Page, Rotate Page. Replace Page, Insert Page, Delete Page....

6 Reviews

Downloads: 10 This Week

Last Update: 2026-03-08
See Project
9

NAPS2 - Not Another PDF Scanner

Scan documents to PDF and other file types, as simply as possible.

Visit NAPS2's home page at www.naps2.com. NAPS2 is a document scanning application with a focus on simplicity and ease of use. Scan your documents from WIA- and TWAIN-compatible scanners, organize the pages as you like, and save them as PDF, TIFF, JPEG, PNG, and other file formats. Available on Windows, Mac, and Linux. NAPS2 is currently available in over 40 different languages. Want to see NAPS2 in your preferred language? Help translate! See the wiki for more details.

149 Reviews

Downloads: 671 This Week

Last Update: 2026-01-10
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

Hathi Download Helper

Download books from the hathitrust website in a fast and easy manner

2025-05-08 ====================== PLEASE NOTE ======================= Due to changes to the API of the hathirtust homepage, the HDH is no longer functional!! Please check the project Wiki for alternative methods. https://sourceforge.net/p/hathidownloadhelper/alternative/ ---------------------------------------------------------------------------------------------- Hathi Download Helper was a tool for downloading public domain books from hathitrust.org. E-Mail contact:...

8 Reviews

Downloads: 54 This Week

Last Update: 2026-03-13
See Project
11

gscan2pdf

A GUI to ease the process of producing a multipage PDF from a scan. gscan2pdf should work on almost any Linux/BSD machine.

22 Reviews

Downloads: 106 This Week

Last Update: 2025-11-05
See Project
12

chessPDFBrowser

Chess application whichs allows working with chess PDF books and PGNs.

Chess application which allows working with PDFs and PGNs. You can work with the chess games of the PDF and edit their tree of variants. Graphical environment. Standard PGN TAGs. PGN comments. Ocr like (Fen string detection from chess board position images). Connection to Uci chess engines (like stockfish). Position analysis, full game analysis. You can now play games against uci engines. pdf2pgn command line command included. Detailed documentation. Multilanguage...

1 Review

Downloads: 23 This Week

Last Update: 2026-04-04
See Project
13

AvantFAX

Multiuser HylaFAX PHP/MySQL Web interface for viewing faxes online, downloading & emailing in PDF format, and categorizing & archiving all sent and received faxes.

10 Reviews

Downloads: 20 This Week

Last Update: 2025-04-10
See Project
14

VietOCR

Provides optical character recognition (OCR) solutions for Vietnamese language.

24 Reviews

Downloads: 159 This Week

Last Update: 2026-05-27
See Project
15

MyBox

Easy Tools of PDF, Image, File, Network, Data, and Medias

javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.

Downloads: 1 This Week

Last Update: 2026-02-10
See Project
16

DocWire SDK

Award-winning modern data processing SDK in C++20

DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...

Downloads: 6 This Week

Last Update: 2026-06-06
See Project
17

OCR Manga Reader for Android

Android Manga reader with Japanese OCR and dictionary capabilities

OCR Manga Reader is a free and open source Android app that allows you to quickly OCR and lookup Japanese words in real-time. It does not have ads or telemetry/spyware and does not require an Internet connection. Supports both EDICT and EPWING dictionaries. Requires Android 4.0 (Ice Cream Sandwich) or higher. See http://ocrmangareaderforandroid.sourceforge.net/ for details.

3 Reviews

Downloads: 33 This Week

Last Update: 2023-10-07
See Project
18

Common Resource Grep - crgrep

Common Resource Grep

CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. Here you...

3 Reviews

Downloads: 6 This Week

Last Update: 2023-04-23
See Project
19

gImageReader

A graphical frontend to tesseract-ocr

gImageReader is a simple Gtk/Qt front-end to tesseract. Features include: - Import PDF documents and images from disk, scanning devices, clipboard and screenshots - Process multiple images and documents in one go - Manual or automatic recognition area definition - Recognize to plain text or to hOCR documents - Recognized text displayed directly next to the image - Post-process the recognized text, including spellchecking - Generate PDF documents from hOCR documents **Note**:...

27 Reviews

Downloads: 122 This Week

Last Update: 2022-01-28
See Project
20

Paperless-ng

A supercharged version of paperless, scan, index and archive docs

Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have to...

Downloads: 0 This Week

Last Update: 2022-03-04
See Project
21

Linux-Intelligent-Ocr-Solution

Easy-OCR solution and Tesseract trainer for GNU/Linux

Linux-intelligent-ocr-solution Lios is a free and open source software for converting print in to text using either scanner or a camera, It can also produce text out of scanned images from other sources such as Pdf, Image, Folder containing Images or screenshot. Program is given total accessibility for visually impaired. A Tesseract Trainer GUI is also shipped with this package. Forum : https://groups.google.com/forum/#!

5 Reviews

Downloads: 3 This Week

Last Update: 2020-10-19
See Project
22

neocr

Provides OCR solutions for Nepali, based on Tesseract 4.0.

NeOCR is a free software based on Tesseract (Open Source OCR Engine) for the Windows operating system. It provides an easy and user-friendly user interface to recognize texts contained in images as well as PDF documents and convert to editable text formats (.txt, .doc, .docx). This product is accessible to Blind and Visually Impaired peoples (tested with NVDA and Narrator).

3 Reviews

Downloads: 4 This Week

Last Update: 2020-04-17
See Project
23

pdfsandwich

pdfsandwich generates "sandwich" OCR pdf files, i.e. pdf files which contain only images (but no editable text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images. pdfsandwich is a command line tool which is supposed to be useful to OCR scanned books or journals. It is able to recognize the page layout even for multicolumn text. Essentially, pdfsandwich is a wrapper script which calls the following binaries:...

8 Reviews

Downloads: 339 This Week

Last Update: 2018-08-12
See Project
24

FormRead

Free OMR - OCR web sofware based on javascript and PHP

https://formread.org FormRead is a completely free OMR (optical mark recognition) web software for scanning and grading user-filled, multiple choice forms. Create your formats with any of your office or drawing tools, scan them and parameterize their coordinates in an easy way. Once you have parameterized your form, you can print many of them, give it to your students/respondents, scan and recognize them with formread, and you can finally export the data in your preferred formats...

Downloads: 11 This Week

Last Update: 2022-03-04
See Project
25

OCR Web based

OCR web based for Browser Firefox & PC

Optical Character Recognition in JS for Browser is based on ocrad.js. OCR for Browser is a free extension and You can use this application to extract text from any image you supply. Just upload your image files. OCR for Browser takes either a JPG, GIF, TIFF, BMP, PNG. ========= Get OCR for Android (Beta release) - https://play.google.com/store/apps/details?id=com.ulm.ocr ========= Add-on for Opera: http://bit.ly/1F0E0wP ========= Release 1.0.1 For safety reasons, I disabled...

2 Reviews

Downloads: 0 This Week

Last Update: 2018-09-05
See Project