Showing 16 open source projects for "python pdf scaper"

View related business solutions
  • Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud Icon
    Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud

    Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

    Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.
    Try Cloud SQL Free
  • Run Any Workload on Compute Engine VMs Icon
    Run Any Workload on Compute Engine VMs

    From dev environments to AI training, choose preset or custom VMs with 1–96 vCPUs and industry-leading 99.95% uptime SLA.

    Compute Engine delivers high-performance virtual machines for web apps, databases, containers, and AI workloads. Choose from general-purpose, compute-optimized, or GPU/TPU-accelerated machine types—or build custom VMs to match your exact specs. With live migration and automatic failover, your workloads stay online. New customers get $300 in free credits.
    Try Compute Engine
  • 1
    Nano PDF Editor

    Nano PDF Editor

    Edit PDF files with Nano Banana

    Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 2
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 97 This Week
    Last Update:
    See Project
  • 4
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Cut Data Warehouse Costs up to 54% with BigQuery Icon
    Cut Data Warehouse Costs up to 54% with BigQuery

    Migrate from Snowflake, Databricks, or Redshift with free migration tools. Exabyte scale without the Exabyte price.

    BigQuery delivers up to 54% lower TCO than cloud alternatives. Migrate from legacy or competing warehouses using free BigQuery Migration Service with automated SQL translation. Get serverless scale with no infrastructure to manage, compressed storage, and flexible pricing—pay per query or commit for deeper discounts. New customers get $300 in free credit.
    Try BigQuery Free
  • 5
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 6
    PdfBooklet
    PdfBooklet is a Python Gtk application which allows to make books or booklets from existing pdf files. It can also adjust margins, rotate, scale, merge files or extract pages.
    Leader badge
    Downloads: 187 This Week
    Last Update:
    See Project
  • 7

    Create Index from PDF

    PDF Indexing Script: Searches PDF for words, records page numbers

    This Python script helps automate the process of creating an index for a PDF document. It reads a list of words from a text file, searches through each page of the PDF, and records the page numbers where each word appears. The script accounts for the first 24 pages of the PDF that use Roman numerals (i-xxiv) and adjusts the page numbers accordingly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    realwatermark

    A Python application to add watermarks (text or image) to PDF files

    A Python application to add watermarks (text or image) to PDF files, converts them into image and back to PDF with options for OCR and compression.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    pdf password cracker

    pdf password cracker

    Pdf password cracker using password list

    Pdf password cracker using password list
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build on Google Cloud with $300 in Free Credit Icon
    Build on Google Cloud with $300 in Free Credit

    New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
    Start Free Trial
  • 10
    HornPenguin Booklet

    HornPenguin Booklet

    Booklet, Signature generator, Imposition

    HornPenguin Booklet is a simple software that generates booklet and signature for bookbinding from your pdf files. You can print your own book signatures and simple pamplet with your home printer. Support diffence signature size from 4 to 32. Change page size during generating signature. Left riffling direction is supported for old asian bookbinding. Imposition routines for rearranged manuscripts
    Downloads: 29 This Week
    Last Update:
    See Project
  • 11
    pdf-editor

    pdf-editor

    Edit your PDFs without needing a subscription or creating accounts

    Edit your PDFs without needing a subscription or creating accounts. Add a GUI/Turn it into a web application. Add a parser for the command line to do multiple commands at once e.g. merge (cut pdf1) pdf2. Tested working with Python 3.8.5. Install venv (py -3.8 -m pip install virtualenv). PDF and Word documents are binary files, which makes them much more complex than plaintext files. In addition to text, they store lots of font, color, and layout information. If you want your programs to read or write to PDFs or Word documents, you’ll need to do more than simply pass their filenames to open().
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Tile Pattern Exporter

    Tile Pattern Exporter

    Tile large format PNG patterns into print-at-home PDF pages

    You can tile large format PNG patterns into print-at-home PDF pages. Created for LearnMYOG. This set of scripts automates the tiling of large format PNG files into letter(A4), tabloid(A3), and A0 sized PDF pages with print margins, alignment and cut guides, page numbers, and a copyright stamp to each page. For best results, input an exported PNG with size in multiples of 7.5 inches wide and 10 inches tall @ 300dpi.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    JosePythonApps

    Here are my python scripts written until now

    Here are my python scripts. They are humble but easy to use and, may be you'll find them useful.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Easy PDF Two Sided

    Easy PDF Two Sided

    Imprimir PDFs a doble cara en cualquier impresora

    Con "Easy PDF Two Sided" puedes dividir tu archivo PDF en dos partes para imprimir todas las páginas pares por una parte, y a continuación imprimir las impares invertidas. De este modo puedes tomar un taco de folios y simplemente darle la vuelta al imprimir la segunda parte, sin tener que reordenar los folios. REQUISITOS: - Windows 7 o superior. - NET Framework 4.7.2 o superior.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    PDF Merge and Edit

    PDF Merge and Edit

    Python script to merge and edit sensitive PDF files

    Python script to merge and edit sensitive PDF files you don't want to upload to random sites you find on Google. Merge PDFs by adding one to another. Update a single page in a PDF (good for adding a signed page to a form) Insert a page into an existing PDF. Delete a page. Click on one of the buttons and a new window will pop up depending on the function.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    PDF-Shuffler
    PDF-Shuffler is a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a frontend for python-pyPdf.
    Downloads: 58 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB