Showing 18 open source projects for "python pdf scaper"

View related business solutions
  • Go from Data Warehouse to Data and AI platform with BigQuery Icon
    Go from Data Warehouse to Data and AI platform with BigQuery

    Build, train, and run ML models with simple SQL. Automate data prep, analysis, and predictions with built-in AI assistance from Gemini.

    BigQuery is more than a data warehouse—it's an autonomous data-to-AI platform. Use familiar SQL to train ML models, run time-series forecasts, and generate AI-powered insights with native Gemini integration. Built-in agents handle data engineering and data science workflows automatically. Get $300 in free credit, query 1 TB, and store 10 GB free monthly.
    Try BigQuery Free
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    Nano PDF Editor

    Nano PDF Editor

    Edit PDF files with Nano Banana

    Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 2
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 97 This Week
    Last Update:
    See Project
  • 4
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Run Any Workload on Compute Engine VMs Icon
    Run Any Workload on Compute Engine VMs

    From dev environments to AI training, choose preset or custom VMs with 1–96 vCPUs and industry-leading 99.95% uptime SLA.

    Compute Engine delivers high-performance virtual machines for web apps, databases, containers, and AI workloads. Choose from general-purpose, compute-optimized, or GPU/TPU-accelerated machine types—or build custom VMs to match your exact specs. With live migration and automatic failover, your workloads stay online. New customers get $300 in free credits.
    Try Compute Engine
  • 5
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 6
    PdfBooklet
    PdfBooklet is a Python Gtk application which allows to make books or booklets from existing pdf files. It can also adjust margins, rotate, scale, merge files or extract pages.
    Leader badge
    Downloads: 187 This Week
    Last Update:
    See Project
  • 7

    Create Index from PDF

    PDF Indexing Script: Searches PDF for words, records page numbers

    This Python script helps automate the process of creating an index for a PDF document. It reads a list of words from a text file, searches through each page of the PDF, and records the page numbers where each word appears. The script accounts for the first 24 pages of the PDF that use Roman numerals (i-xxiv) and adjusts the page numbers accordingly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    realwatermark

    A Python application to add watermarks (text or image) to PDF files

    A Python application to add watermarks (text or image) to PDF files, converts them into image and back to PDF with options for OCR and compression.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    pdf password cracker

    pdf password cracker

    Pdf password cracker using password list

    Pdf password cracker using password list
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    HornPenguin Booklet

    HornPenguin Booklet

    Booklet, Signature generator, Imposition

    HornPenguin Booklet is a simple software that generates booklet and signature for bookbinding from your pdf files. You can print your own book signatures and simple pamplet with your home printer. Support diffence signature size from 4 to 32. Change page size during generating signature. Left riffling direction is supported for old asian bookbinding. Imposition routines for rearranged manuscripts
    Downloads: 29 This Week
    Last Update:
    See Project
  • 11
    pdf-editor

    pdf-editor

    Edit your PDFs without needing a subscription or creating accounts

    Edit your PDFs without needing a subscription or creating accounts. Add a GUI/Turn it into a web application. Add a parser for the command line to do multiple commands at once e.g. merge (cut pdf1) pdf2. Tested working with Python 3.8.5. Install venv (py -3.8 -m pip install virtualenv). PDF and Word documents are binary files, which makes them much more complex than plaintext files. In addition to text, they store lots of font, color, and layout information. If you want your programs to read or write to PDFs or Word documents, you’ll need to do more than simply pass their filenames to open().
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Tile Pattern Exporter

    Tile Pattern Exporter

    Tile large format PNG patterns into print-at-home PDF pages

    You can tile large format PNG patterns into print-at-home PDF pages. Created for LearnMYOG. This set of scripts automates the tiling of large format PNG files into letter(A4), tabloid(A3), and A0 sized PDF pages with print margins, alignment and cut guides, page numbers, and a copyright stamp to each page. For best results, input an exported PNG with size in multiples of 7.5 inches wide and 10 inches tall @ 300dpi.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    JosePythonApps

    Here are my python scripts written until now

    Here are my python scripts. They are humble but easy to use and, may be you'll find them useful.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Easy PDF Two Sided

    Easy PDF Two Sided

    Imprimir PDFs a doble cara en cualquier impresora

    Con "Easy PDF Two Sided" puedes dividir tu archivo PDF en dos partes para imprimir todas las páginas pares por una parte, y a continuación imprimir las impares invertidas. De este modo puedes tomar un taco de folios y simplemente darle la vuelta al imprimir la segunda parte, sin tener que reordenar los folios. REQUISITOS: - Windows 7 o superior. - NET Framework 4.7.2 o superior.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    LadybugPDF (4-up tool)

    LadybugPDF (4-up tool)

    A free open source tool for generating 4-up slides from a PDF document

    A free open source tool for generating 4-up slides from a PDF document. Written in Python 3.7 with the PyPDF2 package, compiled with PyInstaller. Opens a command-line window while running.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    PDF Merge and Edit

    PDF Merge and Edit

    Python script to merge and edit sensitive PDF files

    Python script to merge and edit sensitive PDF files you don't want to upload to random sites you find on Google. Merge PDFs by adding one to another. Update a single page in a PDF (good for adding a signed page to a form) Insert a page into an existing PDF. Delete a page. Click on one of the buttons and a new window will pop up depending on the function.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17

    pdf_encryptor

    Encrypt PDFs with Python and PyPDF2

    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    PDF-Shuffler
    PDF-Shuffler is a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a frontend for python-pyPdf.
    Downloads: 58 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.
Try Free →