Showing 325 open source projects for "pdf python"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    PDF Arranger

    PDF Arranger

    Small python-gtk application, to merge or split PDFs

    PDF Arranger is a small python-gtk application, which helps the user to merge or split PDF documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a front end for pikepdf. PDF Arranger is a fork of Konstantinos Poulios’s PDF Shuffler (see Savannah or Sourceforge). It’s a humble attempt to make the project a bit more active.
    Downloads: 345 This Week
    Last Update:
    See Project
  • 2
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Malicious PDF Generator

    Malicious PDF Generator

    Generate a bunch of malicious pdf files with phone-home functionality

    Generate ten different malicious PDF files with phone-home functionality. Can be used with Burp Collaborator or Interact.sh. Used for penetration testing and/or red-teaming etc. I created this tool because I needed a third-party tool to generate a bunch of PDF files with various links.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Grafana: The open and composable observability platform Icon
    Grafana: The open and composable observability platform

    Faster answers, predictable costs, and no lock-in built by the team helping to make observability accessible to anyone.

    Grafana is the open source analytics & monitoring solution for every database.
    Learn More
  • 5
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 76 This Week
    Last Update:
    See Project
  • 6
    pikepdf

    pikepdf

    A Python library for reading and writing PDF, powered by QPDF

    pikepdf is a Python library allowing the creation, manipulation, and repair of PDFs. It provides a Pythonic wrapper around the C++ PDF content transformation library, QPDF. Python + QPDF = “py” + “qpdf” = “pyqpdf”, which looks like a dyslexia test and is no fun to type. But say “pyqpdf” out loud, and it sounds like “pikepdf”. pikepdf is a library intended for developers who want to create, manipulate, parse, repair, and abuse the PDF format.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 7
    fpdf2

    fpdf2

    Simple PDF generation for Python

    fpdf2 is a library for simple & fast PDF document generation in Python. It is a fork and the successor of PyFPDF. Compared with other PDF libraries, fpdf2 is fast, versatile, easy to learn and to extend (example). It is also entirely written in Python and has very few dependencies: Pillow, defusedxml, & fontTools. It is a fork and the successor of PyFPDF.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 8
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Dun and Bradstreet Risk Analytics - Supplier Intelligence Icon
    Dun and Bradstreet Risk Analytics - Supplier Intelligence

    Use an AI-powered solution for supply and compliance teams who want to mitigate costly supplier risks intelligently.

    Risk, procurement, and compliance teams across the globe are under pressure to deal with geopolitical and business risks. Third-party risk exposure is impacted by rapidly scaling complexity in domestic and cross-border businesses, along with complicated and diverse regulations. It is extremely important for companies to proactively manage their third-party relationships. An AI-powered solution to mitigate and monitor counterparty risks on a continuous basis, this cutting-edge platform is powered by D&B’s Data Cloud with 520M+ Global Business Records and 2B+ yearly updates for third-party risk insights. With high-risk procurement alerts and multibillion match points, D&B Risk Analytics leverages best-in-class risk data to help drive informed decisions. Perform quick and comprehensive screening, using intelligent workflows. Receive ongoing alerts of key business indicators and disruptions.
    Learn More
  • 10
    PDFSticher

    PDFSticher

    Code repository for PDFStitcher, a utility to stitch together PDFs

    The open source PDF stitching software for sewists, by sewists. PDFSticher is a utility for stitching together many PDF pages from one document into a single page. This is also called "N-Up" or page imposition. This program was created in order to convert sewing patterns into a convenient format for projecting, though it could be used to stitch together any PDF. Since version 0.4, it is also possible to select layers for inclusion/exclusion in the final output. Additionally, line properties...
    Downloads: 30 This Week
    Last Update:
    See Project
  • 11
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    PyMuPDF

    PyMuPDF

    Python bindings for MuPDF's rendering library.

    MuPDF is a lightweight PDF, XPS, and E-book viewer. MuPDF consists of a software library, command line tools, and viewers for various platforms. The renderer in MuPDF is tailored for high-quality anti-aliased graphics. It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on the screen. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB,...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    WeasyPrint

    WeasyPrint

    The awesome document factory

    WeasyPrint is a smart solution helping people to create PDF documents. You can generate gorgeous statistical reports, invoices, tickets, and anything you want as long as you have some webdesign skills! Design your documents just as you design your websites! WeasyPrint follows the widely used HTML and CSS specifications from the W3C. You can use your usual web tools, languages and frameworks, but for print. Creating high-quality digital documents requires features that you love to use as...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 14
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 15
    TikZ

    TikZ

    TikZ figures for concepts in physics/chemistry/ML

    Collection of 111 standalone TikZ figures for illustrating concepts in physics, chemistry, and machine learning. Check out janosh.github.io to search, sort, open in Overleaf, and download figures (PDF/SVG/PNG) from this collection.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 16
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 17
    PDFMathTranslate

    PDFMathTranslate

    PDF scientific paper translation with preserved formats

    PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 18
    PDF-utility

    PDF-utility

    PDF Utility is a tool designed to efficiently manipulate PDF files

    Digna PDF Utility is a tool designed to efficiently manipulate PDF documents. It offers a range of functionalities including adding page numbers, deleting unwanted pages, merging multiple PDFs into a single file, converting PDF to DOCX and vice versa, protect a PDF file with password and displaying PDF content.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 19
    ChatGPT Academic

    ChatGPT Academic

    ChatGPT extension for scientific research work

    ChatGPT extension for scientific research work, specially optimized academic paper polishing experience, supports custom shortcut buttons, supports custom function plug-ins, supports markdown table display, double display of Tex formulas, complete code display function, new local Python/C++/Go project tree Analysis function/Project source code self-translation ability, newly added PDF and Word document batch summary function/PDF paper full-text translation function. All buttons are dynamically generated by reading functional.py, you can add custom functions at will, and liberate the pasteboard. Support for markdown tables output by GPT. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 20
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Georg Brandl and licensed under the BSD license. It was originally created for the Python documentation, and it has excellent facilities for the documentation of software projects in a range of languages. Of course, this site is also created from reStructuredText sources using Sphinx! HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text. Semantic markup and automatic links for functions, classes, citations, glossary terms and similar pieces of information. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 21
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    ...A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical formulas, and integrate all of these contents into Markdown format. P2T can also convert an entire PDF file (which can contain scanned images or any other format) into Markdown format.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 22
    rmarkdown

    rmarkdown

    Dynamic Documents for R

    R Markdown is an R package for creating dynamic, reproducible documents that combine code (R, Python, SQL, etc.), results (figures, tables), and narrative text. Built on Knitr and Pandoc, it supports generating HTML, PDF, Word, slideshows, dashboards, and more. It’s widely used in data science and reproducible reporting workflows.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    Pysheeet

    Pysheeet

    Python Cheat Sheet

    Pysheeet is a community-driven collection of Python code snippets covering common patterns and tasks like sockets, file I/O, data structures, and more. Each snippet is concise and battle-tested, designed to save coding time and reduce boilerplate. With documentation hosted on Read the Docs and an active GitHub repo, it’s a go-to resource for Python developers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    word_cloud

    word_cloud

    A little word cloud generator in Python

    ...Before installing a compiler, report an issue describing the version of python and operating system being used. The wordcloud_cli tool can be used to generate word clouds directly from the command-line. If you're dealing with PDF files, then pdftotext, included by default with many Linux distribution, comes in handy. Use wordcloud_cli --help so see all available options. The wordcloud library is MIT licenced, but contains DroidSansMono.ttf, a true type font by Google, that is apache licensed.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Mastering Bitcoin

    Mastering Bitcoin

    Mastering Bitcoin 3rd Edition - Programming the Open Blockchain

    The bitcoinbook repository contains the source code for Mastering Bitcoin, the authoritative open-source book by Andreas M. Antonopoulos on Bitcoin and cryptocurrency technologies. Written in a collaborative and continuously updated format using Markdown and AsciiDoc, the book serves as a comprehensive technical guide for developers, engineers, and system architects who want to understand how Bitcoin works. It covers the protocol, cryptography, peer-to-peer architecture, wallets, mining, and...
    Downloads: 11 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next