extraction free download

pdfly

CLI tool to extract (meta)data from PDF and manipulate PDF files

A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.

Downloads: 5 This Week

Last Update: 2025-10-13

See Project

OCRBase

MD/.JSON Document OCR and structured data extraction API

OCRBase is a self-hostable document OCR and structured extraction system built to turn PDFs into machine-usable outputs at scale, aiming to bridge the gap between raw text extraction and production-ready pipelines. Instead of treating OCR as a one-off script, it presents an API-driven workflow where documents are submitted as jobs and processed through a queue-based architecture that can handle high throughput.

Downloads: 0 This Week

Last Update: 2026-02-27

See Project

OpenDataLoader PDF

PDF Parser for AI-ready data. Automate PDF accessibility

OpenDataLoader PDF is an open-source document processing system designed to convert complex PDF files into structured, AI-ready formats such as Markdown, JSON, and HTML while preserving layout, hierarchy, and semantic meaning. It focuses on enabling downstream use cases like retrieval-augmented generation (RAG), knowledge extraction, and document intelligence pipelines by maintaining accurate reading order and spatial metadata through bounding boxes. The tool combines deterministic parsing methods with an optional hybrid AI-powered mode that improves extraction quality for difficult layouts such as multi-column documents, scanned files, and scientific papers. ...

Downloads: 6 This Week

Last Update: 2026-04-03

See Project

py-pdf-parser

A Python tool to help extracting information from structured PDFs

py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents.

Downloads: 8 This Week

Last Update: 2025-04-28

See Project

Unredact

A simple tool for reading in poorly redacted documents

Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...

Downloads: 16 This Week

Last Update: 2026-02-03

See Project

Nano PDF Editor

Edit PDF files with Nano Banana

Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...

Downloads: 7 This Week

Last Update: 2026-02-05

See Project

PDFtk Bookmarks Editor

GUI for updating PDF bookmarks using PDF Toolkit (PDFtk) on Windows

Free and open source GUI application for updating bookmarks in a PDF document using the PDF Toolkit command line tool, PDFtk Server. User selects the PDF via drag and drop and then edits the bookmark entries in a text file using a simple, 1-line data format. Program handles everything else in response to a few user button clicks. OS: Windows. Author: David King. License: GPLv3.

1 Review

Downloads: 26 This Week

Last Update: 2025-11-11

See Project

iText®, a JAVA PDF library

PDF Library for Developers

iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...

Downloads: 131 This Week

Last Update: 2024-06-01

See Project

Search Results for "extraction"

Showing 8 open source projects for "extraction"

pdfly

OCRBase

OpenDataLoader PDF

py-pdf-parser

Unredact

Nano PDF Editor

PDFtk Bookmarks Editor

iText®, a JAVA PDF library

Search Results for "extraction"

Showing 8 open source projects for "extraction"

pdfly

OCRBase

OpenDataLoader PDF

py-pdf-parser

Unredact

Nano PDF Editor

PDFtk Bookmarks Editor

iText®, a JAVA PDF library

Related Searches

Related Categories