Showing 257 open source projects for "pdf data mining"

View related business solutions
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    Laravel PDF

    Laravel PDF

    Create PDF files in Laravel apps

    This package provides a simple way to create PDFs in Laravel apps. Under the hood it uses Chromium to generate PDFs from Blade views. You can use modern CSS features like grid and flexbox to create beautiful PDFs.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    PDF.js

    PDF.js

    A PDF Reader in JavaScript

    PDF.js is a web standards-based platform for parsing and rendering Portable Document Formats (PDFs). Open source and built with HTML5, this PDF viewer is supported by a great community and Mozilla Labs. PDF.js can be used on both modern and older browsers, and is built into version 19+ of Firefox.
    Downloads: 87 This Week
    Last Update:
    See Project
  • 3
    Snappy PDF

    Snappy PDF

    A ServiceProvider for Snappy

    Laravel Snappy is a Laravel wrapper around the Snappy PDF/Image library, which itself is powered by wkhtmltopdf and wkhtmltoimage, allowing you to generate PDFs and images directly from HTML. It lets you take a Blade view, raw HTML string, or file and turn it into a downloadable, savable, or in-browser PDF/image response with just a few lines of code. The package integrates cleanly with the Laravel service container and offers a simple facade/API so you can quickly configure page size,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 9 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    Nano PDF Editor

    Nano PDF Editor

    Edit PDF files with Nano Banana

    Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 6
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 125 This Week
    Last Update:
    See Project
  • 8
    OpenPDF

    OpenPDF

    open source Java library for creating and editing PDF files

    OpenPDF is a Java library for creating and editing PDF files with a LGPL and MPL open source license. OpenPDF is the LGPL/MPL open source successor of iText, and is based on a fork, of a fork, of iText 4 svn tag.
    Downloads: 39 This Week
    Last Update:
    See Project
  • 9
    iLovePDF Api

    iLovePDF Api

    iLovePDF Rest Api - PHP Library

    Develop and automate PDF processing tasks like Compress PDF, merging PDF, Split PDF, converting Office to PDF, PDF to JPG, Images to PDF, adding Page Numbers, Rotate PDF, Unlocking PDF, stamping a Watermark, and Repair PDF. Each one with several settings to get your desired results. Strong infrastructure to offer the best-dedicated processing power. You might know us from ilovepdf.com where we process millions of PDFs daily. We offer a simple and concise API Reference and Guide as well as...
    Downloads: 15 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    PDFCraft

    PDFCraft

    PDFCraft is a free, privacy-focused PDF toolkit

    PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite.
    Downloads: 44 This Week
    Last Update:
    See Project
  • 11
    BentoPDF

    BentoPDF

    A Privacy First PDF Toolkit

    BentoPDF is a self-hosted, open-source PDF toolkit that provides a suite of local PDF manipulation features for users who want full control over their documents without relying on cloud PDF services. It offers functionality to merge, split, compress, rotate, and convert PDFs through an easy-to-deploy container or local installation, making it ideal for individuals and teams that handle large volumes of PDF files regularly.
    Downloads: 54 This Week
    Last Update:
    See Project
  • 12
    pdfcpu

    pdfcpu

    A PDF processor written in Go

    pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000). This is an effort to build a comprehensive PDF processing library from the ground up written in Go. Over time pdfcpu aims to support the standard range of PDF processing features and also any interesting use cases that may present themselves along the way. The main focus lies on strong support for batch processing and scripting via a...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 13
    TeXworks

    TeXworks

    A simple interface for working with TeX documents

    TeXworks is a free and simple working environment for authoring TeX (LaTeX, ConTeXt and XeTeX) documents. Inspired by Dick Koch's award-winning TeXShop program for Mac OS X, it makes entry into the TeX world easier for those using desktop operating systems other than OS X. It provides an integrated, easy-to-use environment for users on other platforms particularly GNU/Linux and Windows and features a clean, simple interface accessible to casual and non-technical users.
    Downloads: 81 This Week
    Last Update:
    See Project
  • 14
    Snappy PHP

    Snappy PHP

    PHP library allowing thumbnail, snapshot or PDF generation from an URL

    Snappy is a PHP library allowing thumbnail, snapshot or PDF generation from a url or a html page. It uses the excellent WebKit-based wkhtmltopdf and wkhtmltoimage available on OSX, Linux, Windows. You will have to download wkhtmltopdf 0.12.x in order to use Snappy.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    PDF4QT

    PDF4QT

    Open source PDF editor

    PDF4QT is open source PDF editor based on Qt framework. It contains a C++ library, applications for viewing/editing PDF documents, and a command line tool. PDF4QT is an open-source PDF editor for Windows/Linux. It is a modern solution for viewing/editing/rendering PDF documents, for users and developers alike. For developers, there is a C++ library and a command line tool for use in scripts. For users, there are four applications offering many features. The project is hosted on Github and...
    Downloads: 32 This Week
    Last Update:
    See Project
  • 16
    QuestPDF

    QuestPDF

    A library that can help you with generating PDF documents

    Quickly design and generate PDF documents with an open-source, modern, and battle-tested C# library. Forget about limitations, feel confident, enjoy your task and efficiently deliver professional products. QuestPDF is a progressive library that can help you with generating PDF documents in your .NET application by offering a friendly, discoverable and predictable C# fluent API. Do you believe that creating a complete invoice document can take less than 200 lines of code? We have prepared for...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    Pandoc

    Pandoc

    The universal markup converter

    Pandoc is a universal document converter able to convert files from a multitude of markup formats into another. With Pandoc, you have a swiss-army knife of a converter, able to convert practically any markup format into any other. Pandoc contains a Haskell library for conversions as well as a command-line tool that uses this library. It can convert to and from just about anything-- lightweight markup formats, HTML formats, documentation formats, ebooks, TeX formats, word processor formats...
    Downloads: 226 This Week
    Last Update:
    See Project
  • 18
    Laravel Invoices

    Laravel Invoices

    Laravel package to generate PDF invoices from customizable parameters

    Laravel Invoices is a Laravel package for generating invoice PDF files from customizable data. It gives developers a simple interface for creating invoices that can be stored, downloaded, or streamed through configured filesystems. The package supports different templates and locales, making it useful for applications that serve customers in multiple regions. It is designed for business systems, SaaS products, admin panels, and client billing workflows that need invoice output without building the full PDF logic from scratch. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Vanilla.PDF

    Vanilla.PDF

    Cross-platform SDK for creating and modifying PDF documents

    Vanilla.PDF is a modern, high-performance, open-source C++17 SDK designed for creating, editing, signing, and analyzing PDF documents across multiple platforms. It requires no external runtime dependencies, making it lightweight and ideal for embedding into desktop applications, servers, or automation pipelines. The SDK offers full cross-platform support including Windows, Linux, macOS, and Android, with builds available for major compilers and architectures. Vanilla.PDF supports advanced...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    jsPDF

    jsPDF

    HTML5 client solution for generating PDFs

    The leading HTML5 client solution for generating PDFs. Perfect for event tickets, reports, certificates, you name it! PDFs are ubiquitous across the web, with virtually every enterprise relying on them to share documents. We created jsPDF to solve a major problem with how pdf files were being generated. We decided to make it open-source to allow a community of developers to expand on it.
    Downloads: 27 This Week
    Last Update:
    See Project
  • 21
    pdfmake

    pdfmake

    Client/server side PDF printing in pure JavaScript

    Print PDFs directly in the browser or delegate it to your NodeJS backend. Use the same document definition in both cases. Forget about manual x, y calculations. Declare document structure and let pdfmake do the rest. Use paragraphs, columns, lists, tables, canvas, etc. Declare your own styles, use custom fonts, build a DSL and extend the framework. Provides a set of options to disable font layout cache and to control when pages are flushed to the output file. Pdfmake is runnable in browser...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    pandoc-crossref filter

    pandoc-crossref filter

    Pandoc filter for cross-references

    pandoc-crossref is a pandoc filter for numbering figures, equations, tables and cross-references to them. The input file (like demo.md) can be converted into HTML, LaTeX, PDF, Markdown or other formats. Optionally, you can use cleveref for LaTeX/PDF output, e.g. cleveref PDF, cleveref LaTeX, and listings package, e.g. listings PDF, listings LaTeX. This package tries to use LaTeX labels and references if output type is LaTeX. It also tries to supplement rudimentary LaTeX configuration that...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 23
    book-to-skill

    book-to-skill

    Turn any technical book PDF into a Claude Code skill

    book-to-skill is a Claude Code skill that turns technical books and documents into reusable AI reference skills. It extracts content from PDFs and EPUBs, then organizes the material so an assistant can study, reference, and apply it while working. The project is useful for transforming dense manuals, textbooks, internal documentation, or technical guides into practical agent-accessible knowledge. It includes an extraction script and a SKILL.md workflow that guides how the resulting content...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    TikZ

    TikZ

    TikZ figures for concepts in physics/chemistry/ML

    Collection of 111 standalone TikZ figures for illustrating concepts in physics, chemistry, and machine learning. Check out janosh.github.io to search, sort, open in Overleaf, and download figures (PDF/SVG/PNG) from this collection.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 25
    Gotenberg

    Gotenberg

    A Docker-powered stateless API for PDF files

    Gotenberg provides a developer-friendly API to interact with powerful tools like Chromium and LibreOffice for converting numerous document formats (HTML, Markdown, Word, Excel, etc.) into PDF files, and more! Thanks to Docker, you don't have to install each tool in your environments; drop the Docker image in your stack, and you're good to go! The webhook feature allows you to upload the output file to the destination of your choice. There are many options to fit your requirements, from the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next