Showing 214 open source projects for "pdf data mining"

View related business solutions
  • Your monitoring isn't a stack. It's a pile. Fix that. Icon
    Your monitoring isn't a stack. It's a pile. Fix that.

    Errors, performance, logs, uptime. One install, one invoice, one UI.

    Replace Datadog, New Relic, and Sentry without adding three more dashboards.
    Free 30 days.
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Laravel PDF

    Laravel PDF

    Create PDF files in Laravel apps

    This package provides a simple way to create PDFs in Laravel apps. Under the hood it uses Chromium to generate PDFs from Blade views. You can use modern CSS features like grid and flexbox to create beautiful PDFs.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 2
    PDF.js

    PDF.js

    A PDF Reader in JavaScript

    PDF.js is a web standards-based platform for parsing and rendering Portable Document Formats (PDFs). Open source and built with HTML5, this PDF viewer is supported by a great community and Mozilla Labs. PDF.js can be used on both modern and older browsers, and is built into version 19+ of Firefox.
    Downloads: 84 This Week
    Last Update:
    See Project
  • 3
    Snappy PDF

    Snappy PDF

    A ServiceProvider for Snappy

    Laravel Snappy is a Laravel wrapper around the Snappy PDF/Image library, which itself is powered by wkhtmltopdf and wkhtmltoimage, allowing you to generate PDFs and images directly from HTML. It lets you take a Blade view, raw HTML string, or file and turn it into a downloadable, savable, or in-browser PDF/image response with just a few lines of code. The package integrates cleanly with the Laravel service container and offers a simple facade/API so you can quickly configure page size,...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 10 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    Nano PDF Editor

    Nano PDF Editor

    Edit PDF files with Nano Banana

    Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 6
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 7
    OpenPDF

    OpenPDF

    open source Java library for creating and editing PDF files

    OpenPDF is a Java library for creating and editing PDF files with a LGPL and MPL open source license. OpenPDF is the LGPL/MPL open source successor of iText, and is based on a fork, of a fork, of iText 4 svn tag.
    Downloads: 47 This Week
    Last Update:
    See Project
  • 8
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 125 This Week
    Last Update:
    See Project
  • 9
    iLovePDF Api

    iLovePDF Api

    iLovePDF Rest Api - PHP Library

    Develop and automate PDF processing tasks like Compress PDF, merging PDF, Split PDF, converting Office to PDF, PDF to JPG, Images to PDF, adding Page Numbers, Rotate PDF, Unlocking PDF, stamping a Watermark, and Repair PDF. Each one with several settings to get your desired results. Strong infrastructure to offer the best-dedicated processing power. You might know us from ilovepdf.com where we process millions of PDFs daily. We offer a simple and concise API Reference and Guide as well as...
    Downloads: 16 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    PDFCraft

    PDFCraft

    PDFCraft is a free, privacy-focused PDF toolkit

    PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite.
    Downloads: 49 This Week
    Last Update:
    See Project
  • 11
    Snappy PHP

    Snappy PHP

    PHP library allowing thumbnail, snapshot or PDF generation from an URL

    Snappy is a PHP library allowing thumbnail, snapshot or PDF generation from a url or a html page. It uses the excellent WebKit-based wkhtmltopdf and wkhtmltoimage available on OSX, Linux, Windows. You will have to download wkhtmltopdf 0.12.x in order to use Snappy.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 12
    pdfcpu

    pdfcpu

    A PDF processor written in Go

    pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000). This is an effort to build a comprehensive PDF processing library from the ground up written in Go. Over time pdfcpu aims to support the standard range of PDF processing features and also any interesting use cases that may present themselves along the way. The main focus lies on strong support for batch processing and scripting via a...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 13
    TeXworks

    TeXworks

    A simple interface for working with TeX documents

    TeXworks is a free and simple working environment for authoring TeX (LaTeX, ConTeXt and XeTeX) documents. Inspired by Dick Koch's award-winning TeXShop program for Mac OS X, it makes entry into the TeX world easier for those using desktop operating systems other than OS X. It provides an integrated, easy-to-use environment for users on other platforms particularly GNU/Linux and Windows and features a clean, simple interface accessible to casual and non-technical users.
    Downloads: 79 This Week
    Last Update:
    See Project
  • 14
    PDF4QT

    PDF4QT

    Open source PDF editor

    PDF4QT is open source PDF editor based on Qt framework. It contains a C++ library, applications for viewing/editing PDF documents, and a command line tool. PDF4QT is an open-source PDF editor for Windows/Linux. It is a modern solution for viewing/editing/rendering PDF documents, for users and developers alike. For developers, there is a C++ library and a command line tool for use in scripts. For users, there are four applications offering many features. The project is hosted on Github and...
    Downloads: 37 This Week
    Last Update:
    See Project
  • 15
    QuestPDF

    QuestPDF

    A library that can help you with generating PDF documents

    Quickly design and generate PDF documents with an open-source, modern, and battle-tested C# library. Forget about limitations, feel confident, enjoy your task and efficiently deliver professional products. QuestPDF is a progressive library that can help you with generating PDF documents in your .NET application by offering a friendly, discoverable and predictable C# fluent API. Do you believe that creating a complete invoice document can take less than 200 lines of code? We have prepared for...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    Pandoc

    Pandoc

    The universal markup converter

    Pandoc is a universal document converter able to convert files from a multitude of markup formats into another. With Pandoc, you have a swiss-army knife of a converter, able to convert practically any markup format into any other. Pandoc contains a Haskell library for conversions as well as a command-line tool that uses this library. It can convert to and from just about anything-- lightweight markup formats, HTML formats, documentation formats, ebooks, TeX formats, word processor formats...
    Downloads: 242 This Week
    Last Update:
    See Project
  • 17
    Vanilla.PDF

    Vanilla.PDF

    Cross-platform SDK for creating and modifying PDF documents

    Vanilla.PDF is a modern, high-performance, open-source C++17 SDK designed for creating, editing, signing, and analyzing PDF documents across multiple platforms. It requires no external runtime dependencies, making it lightweight and ideal for embedding into desktop applications, servers, or automation pipelines. The SDK offers full cross-platform support including Windows, Linux, macOS, and Android, with builds available for major compilers and architectures. Vanilla.PDF supports advanced...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Laravel Invoices

    Laravel Invoices

    Laravel package to generate PDF invoices from customizable parameters

    Laravel Invoices is a Laravel package for generating invoice PDF files from customizable data. It gives developers a simple interface for creating invoices that can be stored, downloaded, or streamed through configured filesystems. The package supports different templates and locales, making it useful for applications that serve customers in multiple regions. It is designed for business systems, SaaS products, admin panels, and client billing workflows that need invoice output without building the full PDF logic from scratch. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    jsPDF

    jsPDF

    HTML5 client solution for generating PDFs

    The leading HTML5 client solution for generating PDFs. Perfect for event tickets, reports, certificates, you name it! PDFs are ubiquitous across the web, with virtually every enterprise relying on them to share documents. We created jsPDF to solve a major problem with how pdf files were being generated. We decided to make it open-source to allow a community of developers to expand on it.
    Downloads: 31 This Week
    Last Update:
    See Project
  • 20
    pdfmake

    pdfmake

    Client/server side PDF printing in pure JavaScript

    Print PDFs directly in the browser or delegate it to your NodeJS backend. Use the same document definition in both cases. Forget about manual x, y calculations. Declare document structure and let pdfmake do the rest. Use paragraphs, columns, lists, tables, canvas, etc. Declare your own styles, use custom fonts, build a DSL and extend the framework. Provides a set of options to disable font layout cache and to control when pages are flushed to the output file. Pdfmake is runnable in browser...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 21
    DeckTape

    DeckTape

    PDF exporter for HTML presentations

    DeckTape is a high-quality PDF exporter for HTML presentation frameworks. DeckTape is built on top of Puppeteer which relies on Google Chrome for laying out and rendering Web pages and provides a headless Chrome instance scriptable with a JavaScript API. DeckTape currently supports the following presentation frameworks out of the box. DeckTape also provides a generic command that works by emulating the end-user interaction, allowing it to be used to convert presentations from virtually any...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 22
    pandoc-crossref filter

    pandoc-crossref filter

    Pandoc filter for cross-references

    pandoc-crossref is a pandoc filter for numbering figures, equations, tables and cross-references to them. The input file (like demo.md) can be converted into HTML, LaTeX, PDF, Markdown or other formats. Optionally, you can use cleveref for LaTeX/PDF output, e.g. cleveref PDF, cleveref LaTeX, and listings package, e.g. listings PDF, listings LaTeX. This package tries to use LaTeX labels and references if output type is LaTeX. It also tries to supplement rudimentary LaTeX configuration that...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 23
    TikZ

    TikZ

    TikZ figures for concepts in physics/chemistry/ML

    Collection of 111 standalone TikZ figures for illustrating concepts in physics, chemistry, and machine learning. Check out janosh.github.io to search, sort, open in Overleaf, and download figures (PDF/SVG/PNG) from this collection.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 24
    WebViewer UI

    WebViewer UI

    WebViewer UI built in React

    WebViewer UI sits on top of WebViewer, a powerful JavaScript-based PDF Library that's part of the PDFTron PDF SDK. Built in React, WebViewer UI provides a slick out-of-the-box responsive UI that interacts with the core library to view, annotate and manipulate PDFs that can be embedded into any web project. This repo is specifically designed for any users interested in advanced customizations. With the source code access, it gives developers full control to customize & style the UI, build...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    Gotenberg

    Gotenberg

    A Docker-powered stateless API for PDF files

    Gotenberg provides a developer-friendly API to interact with powerful tools like Chromium and LibreOffice for converting numerous document formats (HTML, Markdown, Word, Excel, etc.) into PDF files, and more! Thanks to Docker, you don't have to install each tool in your environments; drop the Docker image in your stack, and you're good to go! The webhook feature allows you to upload the output file to the destination of your choice. There are many options to fit your requirements, from the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next