Showing 66 open source projects for "pdf tool"

View related business solutions
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Malicious PDF Generator

    Malicious PDF Generator

    Generate a bunch of malicious pdf files with phone-home functionality

    Generate ten different malicious PDF files with phone-home functionality. Can be used with Burp Collaborator or Interact.sh. Used for penetration testing and/or red-teaming etc. I created this tool because I needed a third-party tool to generate a bunch of PDF files with various links.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 4
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    Umi-OCR

    Umi-OCR

    OCR software, free and offline

    Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. ...
    Downloads: 3,158 This Week
    Last Update:
    See Project
  • 6
    MarkPDFDown

    MarkPDFDown

    A high-quality PDF to Markdown tool based on large language model

    MarkPDFdown is an open-source document processing tool designed to convert PDF files into structured Markdown output that can be easily used for documentation, content pipelines, and AI processing workflows. The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Blackbird

    Blackbird

    OSINT tool for finding accounts across 600+ sites by username or email

    ...The tool operates primarily through a command line interface, allowing users to run automated searches and gather results from many platforms in a single process. Blackbird also includes an optional AI-powered profiling feature that analyzes discovered sites to generate behavioral and technical insights about a user’s online presence. Results from searches can be exported in formats such as PDF, CSV, or JSON for documentation or reporting purposes.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 9
    Semantra

    Semantra

    Multi-tool for semantic search

    Semantra is an open-source semantic search tool designed to help users explore large collections of documents by meaning rather than simple keyword matching. The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 10
    Toonily Downloader

    Toonily Downloader

    A python tool for downloading manga from Toonily

    ...It uses concurrent downloading techniques to significantly speed up the process and includes robust error handling to recover from interruptions or failed downloads. Additionally, the tool allows users to convert downloaded chapters into high-quality PDF files without re-encoding images, ensuring fidelity to the original content.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    MetaScreener

    MetaScreener

    AI-powered tool for efficient abstract and PDF screening

    ...The platform can analyze both abstracts and full PDF documents, enabling automated filtering based on research criteria defined by the user. By incorporating natural language processing techniques, the system can identify potentially relevant studies and reduce the workload associated with manual screening.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    PDF-utility

    PDF-utility

    PDF Utility is a tool designed to efficiently manipulate PDF files

    Digna PDF Utility is a tool designed to efficiently manipulate PDF documents. It offers a range of functionalities including adding page numbers, deleting unwanted pages, merging multiple PDFs into a single file, converting PDF to DOCX and vice versa, protect a PDF file with password and displaying PDF content.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    WeebCentral Downloader

    WeebCentral Downloader

    A powerful manga downloader for WeebCentral with both GUI and CLI

    ...The software includes a visually distinctive GUI built with PyQt6, featuring a modern design system and interactive components for managing downloads and viewing manga information. Users can select specific chapters, adjust download speed, and configure output formats such as PDF or CBZ, making it adaptable to different reading preferences. The tool also incorporates progress tracking and background worker threads to ensure a responsive experience during large downloads. Its modular structure separates scraping logic, interface components, and configuration management, making it maintainable and extensible.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 15
    PDFMathTranslate

    PDFMathTranslate

    PDF scientific paper translation with preserved formats

    PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.
    Downloads: 45 This Week
    Last Update:
    See Project
  • 16
    bilingual_book_maker

    bilingual_book_maker

    Make bilingual epub books Using AI translate

    bilingual_book_maker is an AI-assisted translation tool for creating bilingual and multilingual versions of books and text files. It is designed to process formats such as EPUB, TXT, SRT, and PDF, then generate translated output that helps readers compare the original text with the target language. The project supports multiple AI providers and models, including OpenAI-compatible models and other translation backends through LiteLLM-style integrations.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Docling

    Docling

    Get your documents ready for gen AI

    ...The project focuses on converting and parsing many document formats into a unified structured representation that downstream systems can easily consume. It supports advanced PDF understanding, including layout detection, table extraction, and reading order analysis, enabling high-fidelity document intelligence pipelines. Docling is designed to run efficiently on commodity hardware and can be used both as a Python API and a command-line tool. Its modular architecture allows developers to extend functionality and integrate specialized models for tasks such as OCR and audio transcription. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    ...Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical formulas, and integrate all of these contents into Markdown format. P2T can also convert an entire PDF file (which can contain scanned images or any other format) into Markdown format.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 19
    Merge PDF files instantly with this simple, free Windows tool by Franklin Ogot.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Georg Brandl and licensed under the BSD license. It was originally created for the Python documentation, and it has excellent facilities for the documentation of software projects in a range of languages. Of course, this site is also created from reStructuredText sources using Sphinx!
    Downloads: 15 This Week
    Last Update:
    See Project
  • 21
    ArchiveBox

    ArchiveBox

    Open source self-hosted web archiving

    ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline. Without active preservation effort, everything on the internet eventually disappears or degrades. Archive.org does a great job as a centralized service, but saved URLs have to be public, and they can't save every type of content. ArchiveBox is an open source tool that lets organizations & individuals archive both public & private web content while retaining control over their data....
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    Tarjamento de Dados Pessoais e Sigilosos

    Tarjamento de Dados Pessoais e Sigilosos

    Ferramenta de Tarjamento de Dados Pessoais e Sigilosos

    TarjaPDF v2.0 Beta — Ferramenta de Tarjamento de Dados Pessoais e Sigilosos Proteja dados sensíveis em PDFs com segurança irreversível. Interface moderna com dark mode, marcação manual (texto, linha e área livre), detecção automática de CPF, RG, e-mail, telefone, nomes próprios e endereços. Escaneamento inteligente com análise preditiva: destaca dados pessoais para revisão antes de tarjar. Detecção de nomes via heurística e base oficial, com dicionário customizável. Relatório de...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 23
    Google Open Source Project Style Guide

    Google Open Source Project Style Guide

    Chinese version of Google open source project style guide

    Each larger open source project has its own style guide, a series of conventions on how to write code for the project (sometimes more arbitrary). When all the code maintains a consistent style, it is more important when understanding large code bases. easy. The meaning of "style" covers a wide range, from "variables use camelCase" to "never use global variables" to "never use exceptions". The English version of the project maintains the programming style guidelines used in Google. If the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Webifier

    Webifier

    A GitHub Action to deploy Notebooks, Markdowns

    Webifier is a stand-alone build tool for converting any repository into a deployable jekyll website. You can define your pages via yaml files and provide notebooks, markdown and pdf and other files for Webifier to render. It uses python markdown providing additional control over attributes and other extensive functionalities. It lets you define and direct how your web pages feel and automatically manages your assets, making it a perfect solution for fast static website development and a straightforward tool for creating Github pages as a Github action. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Auth0 Logo