OCRmyPDF

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Features

Generates a searchable PDF/A file from a regular PDF
Places OCR text accurately below the image to ease copy / paste
Keeps the exact resolution of the original embedded images
When possible, inserts OCR information as a "lossless" operation without disrupting any other content
Optimizes PDF images, often producing files smaller than the input file
If requested, deskews and/or cleans the image before performing OCR
Distributes work across all available CPU cores

Project Samples

Project Activity

See All Activity >

License

Mozilla Public License 1.0 (MPL)

Follow OCRmyPDF

OCRmyPDF Web Site

Other Useful Business Software

Forever Free Full-Stack Observability | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account

Rate This Project

User Reviews

Be the first to post a review of OCRmyPDF!

Additional Project Details

Programming Language

Python

Related Categories

Python PDF Software, Python OCR Software

Registered

2023-11-17

Similar Business Software

MobiPDF (formerly PDF Extra)

MobiPDF (formerly PDF Extra) is an intuitive and powerful PDF editor and reader designed for today’s modern user - the cost-efficient alternative to Adobe Acrobat Pro you’ve been looking for. FEATURES OVERVIEW: PDF Viewer and Reader: Switch between page views or use "Read Mode" for...

See Software
Nutrient SDK

Nutrient is the comprehensive solution for all your PDF needs, offering tools that effortlessly integrate and operate PDF functionality across any platform. 1. SDK PRODUCTS Integrate robust PDF functionality into iOS, Android, Windows, web (JavaScript), or any cross-platform technology,...

See Software
PDFCreator

PDFCreator simplifies converting printable documents into high-quality PDFs and other formats like JPG, PNG, and TIF. Easily merge multiple files into one PDF and automate saving with the PDF printer feature. Customizable profiles allow quick access to frequently used settings. Whether for...

See Software
Cisdem PDF Converter OCR

Cisdem PDF Converter OCR is your all-in-one solution for converting PDFs into editable formats while preserving original layouts. With advanced OCR technology, it can also accurately recognizes text from scanned documents and images—making it the perfect tool for professionals, students, and...

See Software
FlexiPDF

Editing PDFs has never been so easy. Edit PDFs as easily as with a word processor. Have you ever wanted to edit the text of a PDF? Insert or replace images in a PDF file? Convert scanned pages to editable documents? With FlexiPDF, you can! Creating, editing and commenting in PDF files is just as...

See Software
Aquaforest Searchlight

Ensure your documents are 100% searchable with Aquaforest Searchlight's automated OCR for SharePoint, Office 365, and Windows. Aquaforest Searchlight automatically takes non-searchable documents such as Images PDFs, scanned image files, and faxes and convert the files to fully searchable PDF...

See Software