Showing 331 open source projects for "pdf python"

View related business solutions
  • Find Hidden Risks in Windows Task Scheduler Icon
    Find Hidden Risks in Windows Task Scheduler

    Free diagnostic script reveals configuration issues, error patterns, and security risks. Instant HTML report.

    Windows Task Scheduler might be hiding critical failures. Download the free JAMS diagnostic tool to uncover problems before they impact production—get a color-coded risk report with clear remediation steps in minutes.
    Download Free Tool
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    PDF Arranger

    PDF Arranger

    Small python-gtk application, to merge or split PDFs

    PDF Arranger is a small python-gtk application, which helps the user to merge or split PDF documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a front end for pikepdf. PDF Arranger is a fork of Konstantinos Poulios’s PDF Shuffler (see Savannah or Sourceforge). It’s a humble attempt to make the project a bit more active.
    Downloads: 450 This Week
    Last Update:
    See Project
  • 2
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Malicious PDF Generator

    Malicious PDF Generator

    Generate a bunch of malicious pdf files with phone-home functionality

    Generate ten different malicious PDF files with phone-home functionality. Can be used with Burp Collaborator or Interact.sh. Used for penetration testing and/or red-teaming etc. I created this tool because I needed a third-party tool to generate a bunch of PDF files with various links.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 116 This Week
    Last Update:
    See Project
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 5
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    pikepdf

    pikepdf

    A Python library for reading and writing PDF, powered by QPDF

    pikepdf is a Python library allowing the creation, manipulation, and repair of PDFs. It provides a Pythonic wrapper around the C++ PDF content transformation library, QPDF. Python + QPDF = “py” + “qpdf” = “pyqpdf”, which looks like a dyslexia test and is no fun to type. But say “pyqpdf” out loud, and it sounds like “pikepdf”. pikepdf is a library intended for developers who want to create, manipulate, parse, repair, and abuse the PDF format.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    fpdf2

    fpdf2

    Simple PDF generation for Python

    fpdf2 is a library for simple & fast PDF document generation in Python. It is a fork and the successor of PyFPDF. Compared with other PDF libraries, fpdf2 is fast, versatile, easy to learn and to extend (example). It is also entirely written in Python and has very few dependencies: Pillow, defusedxml, & fontTools. It is a fork and the successor of PyFPDF.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Desktop and Mobile Device Management Software Icon
    Desktop and Mobile Device Management Software

    It's a modern take on desktop management that can be scaled as per organizational needs.

    Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
    Learn More
  • 10
    PDFSticher

    PDFSticher

    Code repository for PDFStitcher, a utility to stitch together PDFs

    The open source PDF stitching software for sewists, by sewists. PDFSticher is a utility for stitching together many PDF pages from one document into a single page. This is also called "N-Up" or page imposition. This program was created in order to convert sewing patterns into a convenient format for projecting, though it could be used to stitch together any PDF. Since version 0.4, it is also possible to select layers for inclusion/exclusion in the final output. Additionally, line properties...
    Downloads: 38 This Week
    Last Update:
    See Project
  • 11
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    WeasyPrint

    WeasyPrint

    The awesome document factory

    WeasyPrint is a smart solution helping people to create PDF documents. You can generate gorgeous statistical reports, invoices, tickets, and anything you want as long as you have some webdesign skills! Design your documents just as you design your websites! WeasyPrint follows the widely used HTML and CSS specifications from the W3C. You can use your usual web tools, languages and frameworks, but for print. Creating high-quality digital documents requires features that you love to use as...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 13
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 40 This Week
    Last Update:
    See Project
  • 14
    PyMuPDF

    PyMuPDF

    Python bindings for MuPDF's rendering library.

    MuPDF is a lightweight PDF, XPS, and E-book viewer. MuPDF consists of a software library, command line tools, and viewers for various platforms. The renderer in MuPDF is tailored for high-quality anti-aliased graphics. It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on the screen. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    TikZ

    TikZ

    TikZ figures for concepts in physics/chemistry/ML

    Collection of 111 standalone TikZ figures for illustrating concepts in physics, chemistry, and machine learning. Check out janosh.github.io to search, sort, open in Overleaf, and download figures (PDF/SVG/PNG) from this collection.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    Umi-OCR

    Umi-OCR

    OCR software, free and offline

    Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch...
    Downloads: 46 This Week
    Last Update:
    See Project
  • 17
    PDFMathTranslate

    PDFMathTranslate

    PDF scientific paper translation with preserved formats

    PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 18
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches. It implements multiple PDF...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    word_cloud

    word_cloud

    A little word cloud generator in Python

    ...Before installing a compiler, report an issue describing the version of python and operating system being used. The wordcloud_cli tool can be used to generate word clouds directly from the command-line. If you're dealing with PDF files, then pdftotext, included by default with many Linux distribution, comes in handy. Use wordcloud_cli --help so see all available options. The wordcloud library is MIT licenced, but contains DroidSansMono.ttf, a true type font by Google, that is apache licensed.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 20
    rmarkdown

    rmarkdown

    Dynamic Documents for R

    R Markdown is an R package for creating dynamic, reproducible documents that combine code (R, Python, SQL, etc.), results (figures, tables), and narrative text. Built on Knitr and Pandoc, it supports generating HTML, PDF, Word, slideshows, dashboards, and more. It’s widely used in data science and reproducible reporting workflows.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 21
    PDF-utility

    PDF-utility

    PDF Utility is a tool designed to efficiently manipulate PDF files

    Digna PDF Utility is a tool designed to efficiently manipulate PDF documents. It offers a range of functionalities including adding page numbers, deleting unwanted pages, merging multiple PDFs into a single file, converting PDF to DOCX and vice versa, protect a PDF file with password and displaying PDF content.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 23
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Georg Brandl and licensed under the BSD license. It was originally created for the Python documentation, and it has excellent facilities for the documentation of software projects in a range of languages. Of course, this site is also created from reStructuredText sources using Sphinx! HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text. Semantic markup and automatic links for functions, classes, citations, glossary terms and similar pieces of information. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 24
    RenderCV

    RenderCV

    LaTeX CV generator from a YAML/JSON input file

    RenderCV is a LaTeX CV/resume framework. It allows you to create a high-quality CV as a PDF from a YAML file with full Markdown syntax support and complete control over the LaTeX code. RenderCV offers built-in LaTeX and Markdown templates ready to produce high-quality CVs. However, the templates are entirely arbitrary and can easily be updated to leverage RenderCV's capabilities with your custom CV themes.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next