Showing 61 open source projects for "python pdf scaper"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Go from Data Warehouse to Data and AI platform with BigQuery Icon
    Go from Data Warehouse to Data and AI platform with BigQuery

    Build, train, and run ML models with simple SQL. Automate data prep, analysis, and predictions with built-in AI assistance from Gemini.

    BigQuery is more than a data warehouse—it's an autonomous data-to-AI platform. Use familiar SQL to train ML models, run time-series forecasts, and generate AI-powered insights with native Gemini integration. Built-in agents handle data engineering and data science workflows automatically. Get $300 in free credit, query 1 TB, and store 10 GB free monthly.
    Try BigQuery Free
  • 1
    PDF Arranger

    PDF Arranger

    Small python-gtk application, to merge or split PDFs

    PDF Arranger is a small python-gtk application, which helps the user to merge or split PDF documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a front end for pikepdf. PDF Arranger is a fork of Konstantinos Poulios’s PDF Shuffler (see Savannah or Sourceforge). It’s a humble attempt to make the project a bit more active.
    Downloads: 446 This Week
    Last Update:
    See Project
  • 2
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and...
    Downloads: 14 This Week
    Last Update:
    See Project
  • $300 in Free Credit for Your Google Cloud Projects Icon
    $300 in Free Credit for Your Google Cloud Projects

    Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

    Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 5
    PdfBooklet
    PdfBooklet is a Python Gtk application which allows to make books or booklets from existing pdf files. It can also adjust margins, rotate, scale, merge files or extract pages.
    Leader badge
    Downloads: 199 This Week
    Last Update:
    See Project
  • 6
    kb

    kb

    A minimalist command line knowledge base manager

    kb is a minimalist command-line knowledge base manager that gives users a fast, organized way to collect, store, search, and retrieve notes, documents, cheatsheets, procedures, and other artifacts directly from the terminal. It was created to solve the common problem of having scattered text files or reference materials on disk that are hard to search or categorize, and it surfaces a simple CLI interface with intuitive commands for adding, viewing, editing, and deleting knowledge items. Each...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    pdf2wordx

    pdf2wordx

    Convertir "pdf" a documentos ".docx"

    `pip install pdf2wordx` Este proyecto usa "Tkinter" V8.6 y usa "pdf2docx" V0.5.8 para realizar las conversiones de PDF a DOCX. El programa es fácil de usar, solo se dene seleccionar el archivo PDF, Bucar la carpeta donde se guardará el documento DOCX, finalmente de click en el botón "Convertir", el documento se convertirá y guardará en la ruta especificada.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    pdf combiner merger converter splitter

    pdf combiner merger converter splitter

    PDF Combiner is a user-friendly, GUI-based tool built in

    PDF Combiner is a user-friendly open source free to use, GUI-based tool for combining, pdf to excel, pdf to word, image to pdf, zip, unzip annotate and splitting PDF files. It is easy to use, supports multiple file insert and delete and process, and allows you to adjust the order of files before combining.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 99.99% Uptime for MySQL and PostgreSQL on Google Cloud Icon
    99.99% Uptime for MySQL and PostgreSQL on Google Cloud

    Enterprise Plus edition delivers sub-second maintenance downtime and 2x read/write performance. Built for critical apps.

    Cloud SQL Enterprise Plus gives you a 99.99% availability SLA with near-zero downtime maintenance—typically under 10 seconds. Get 2x better read/write performance, intelligent data caching, and 35 days of point-in-time recovery. Supports MySQL, PostgreSQL, and SQL Server with built-in vector search for gen AI apps. New customers get $300 in free credit.
    Try Cloud SQL Free
  • 10
    dxf2gcode

    dxf2gcode

    DXF2GCODE: converting 2D dxf drawings to CNC machine compatible G-Code

    DXF2GCODE is a tool for converting 2D (dxf, pdf, ps) drawings to CNC machine compatible GCode. Windows, Linux, and Mac support by using python scripting language.
    Leader badge
    Downloads: 359 This Week
    Last Update:
    See Project
  • 11
    bridgex

    bridgex

    Convert files like docx, xlsx, pptx, html, and more to MarkDown

    Bridgex is an open‑source graphical interface for converting files to Markdown, built in Python and based on Pyside6 (Qt for Python). Its objective is to simplify access to the Markitdown library through a straightforward, modular visual experience. Features ✨ - Cross‑platform graphical interface. - Efficient file‑to‑Markdown conversion. - Modularity: easy to adapt and extend. - Support for multiple input formats. - Lightweight editing prior to saving.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Scribus

    Scribus

    Powerful desktop publishing software

    Scribus is an Open Source program that brings professional page layout to Linux, BSD UNIX, Solaris, OpenIndiana, GNU/Hurd, Mac OS X, OS/2 Warp 4, eComStation, and Windows desktops with a combination of press-ready output and new approaches to page design. Underneath a modern and user-friendly interface, Scribus supports professional publishing features, such as color separations, CMYK and spot colors, ICC color management, and versatile PDF creation.
    Leader badge
    Downloads: 14,492 This Week
    Last Update:
    See Project
  • 13
    Apache OpenOffice

    Apache OpenOffice

    The free and Open Source productivity suite

    Free alternative for Office productivity tools: Apache OpenOffice - formerly known as OpenOffice.org - is an open-source office productivity software suite containing word processor, spreadsheet, presentation, graphics, formula editor, and database management applications. OpenOffice is available in many languages, works on all common computers, stores data in ODF - the international open standard format - and is able to read and write files in other formats, included the format used by the...
    Leader badge
    Downloads: 288,280 This Week
    Last Update:
    See Project
  • 14
    pdf-editor

    pdf-editor

    Edit your PDFs without needing a subscription or creating accounts

    Edit your PDFs without needing a subscription or creating accounts. Add a GUI/Turn it into a web application. Add a parser for the command line to do multiple commands at once e.g. merge (cut pdf1) pdf2. Tested working with Python 3.8.5. Install venv (py -3.8 -m pip install virtualenv). PDF and Word documents are binary files, which makes them much more complex than plaintext files. In addition to text, they store lots of font, color, and layout information. If you want your programs to read or write to PDFs or Word documents, you’ll need to do more than simply pass their filenames to open().
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    QuickPlot

    QuickPlot

    Simple user interface for gnuplot aimed for reflectometry data

    Graphical user interface for gnuplot to create publication quality figure very quickly. It supports templates for fast formatting of graphics, different plot styles, insets, axis and label options. One important feature is storing metadata in png and pdf files that can be used to reload any graph saved with QuickPlot.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    proreports

    proreports

    Simple Reporting System

    ProReports is simple reporting system designed to generate reports in popular office formats - PDF, XLS, RTF, HTML, TXT, XML, JSON, CSV, PNG, GIF. These reports are generated based on the definition in the internal database system. ProReports supports jrxml (JasperReport) format. This type of report templates can be prepared in external editor, such as iReport. Also user can prepare report in internal format of ProReports (simple Visual Programming Language mixed with PHP5 and JAVA or Python). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Jupyter Notebooks as PDF

    Jupyter Notebooks as PDF

    Save Jupyter Notebooks as PDF

    This Jupyter notebook extension allows you to save your notebook as a PDF. To make it easier to reproduce the contents of the PDF at a later date the original notebook is attached to the PDF. Unfortunately not all PDF viewers know how to deal with attachments. PDF viewers known to support downloading of file attachments are: Acrobat Reader, pdf.js and evince. The pdftk CLI program can also extract attached files from a PDF. Preview for OSX does not know how to display/give you access to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Fryktelig fin faktura

    Fryktelig fin faktura

    Create, review and administer norwegian invoices.

    Create, review and administer invoices for use in Norway, and print on the F60 faktura form or send by email as PDF. «Fryktelig Fin Faktura» er et fakturaprogram for norske forhold. Fakturaene lages på PDF eller skjema F60. Dersom du kan hjelpe til, er vi åpne for flere utviklere. Releases are signed with this GnuPG/PGP key fingerprint: «7F09 D1F8 1C3E 1758 4AC9 C61A 4F5A D64D FA68 7324»
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    QtiPlot
    QtiPlot is a user-friendly, platform independent data analysis and visualization application similar to the non-free Windows program Origin.
    Downloads: 53 This Week
    Last Update:
    See Project
  • 20
    PDF Merge and Edit

    PDF Merge and Edit

    Python script to merge and edit sensitive PDF files

    Python script to merge and edit sensitive PDF files you don't want to upload to random sites you find on Google. Merge PDFs by adding one to another. Update a single page in a PDF (good for adding a signed page to a form) Insert a page into an existing PDF. Delete a page. Click on one of the buttons and a new window will pop up depending on the function.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    PDF-Shuffler
    PDF-Shuffler is a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a frontend for python-pyPdf.
    Downloads: 57 This Week
    Last Update:
    See Project
  • 22
    PyX is a Python package for the creation of EPS, PS, PDF and SVG files. It combines an abstraction of the PostScript drawing model with a TeX/LaTeX interface. Complex tasks like 2d and 3d plots in publication-ready quality are built out of these primitives.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    matplotlib
    Matplotlib is a python library for making publication quality plots using a syntax familiar to MATLAB users. Matplotlib uses numpy for numerics. Output formats include PDF, Postscript, SVG, and PNG, as well as screen display. As of matplotlib version 1.5, we are no longer making file releases available on SourceForge. Please visit http://matplotlib.org/users/installing.html for help obtaining matplotlib.
    Leader badge
    Downloads: 50 This Week
    Last Update:
    See Project
  • 24
    colorview2d

    colorview2d

    Extendible 2D color plotting tool.

    Visualize and analyze 3D data files using 2D colorplots. Inspired by spyview. Written in python.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    GimpPy uses img maps & an img as the input, output is a report.py file used to generate PDFs, the out files may run solo or chained together to make more complex multi page reports. Input required is a dict with vals for flds you have mapped on your img.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.
Try Free →