Showing 50 open source projects for "pdf data mining"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 1
    OpenPDF

    OpenPDF

    open source Java library for creating and editing PDF files

    OpenPDF is a Java library for creating and editing PDF files with a LGPL and MPL open source license. OpenPDF is the LGPL/MPL open source successor of iText, and is based on a fork, of a fork, of iText 4 svn tag.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 2
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Snappy PHP

    Snappy PHP

    PHP library allowing thumbnail, snapshot or PDF generation from an URL

    Snappy is a PHP library allowing thumbnail, snapshot or PDF generation from a url or a html page. It uses the excellent WebKit-based wkhtmltopdf and wkhtmltoimage available on OSX, Linux, Windows. You will have to download wkhtmltopdf 0.12.x in order to use Snappy.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    QuestPDF

    QuestPDF

    A library that can help you with generating PDF documents

    Quickly design and generate PDF documents with an open-source, modern, and battle-tested C# library. Forget about limitations, feel confident, enjoy your task and efficiently deliver professional products. QuestPDF is a progressive library that can help you with generating PDF documents in your .NET application by offering a friendly, discoverable and predictable C# fluent API. Do you believe that creating a complete invoice document can take less than 200 lines of code? We have prepared for...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    Vanilla.PDF

    Vanilla.PDF

    Cross-platform SDK for creating and modifying PDF documents

    Vanilla.PDF is a modern, high-performance, open-source C++17 SDK designed for creating, editing, signing, and analyzing PDF documents across multiple platforms. It requires no external runtime dependencies, making it lightweight and ideal for embedding into desktop applications, servers, or automation pipelines. The SDK offers full cross-platform support including Windows, Linux, macOS, and Android, with builds available for major compilers and architectures. Vanilla.PDF supports advanced...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 8
    carbone

    carbone

    Fast and simple report generator, from JSON to pdf, xslx, docx, odt

    Turn your JSON into PDF, DOCX, XLSX, PPTX, ODS and many more. Fast, Simple and Powerful report generator in any format PDF, DOCX, XLSX, ODT, PPTX, ODS, XML, CSV using templates and your JSON data as input.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    pdfme

    pdfme

    A TypeScript based PDF generator library, made with React

    TypeScript base PDF generator and React-based UI. Open source, developed by the community, and completely free to use under the MIT license. No complex operations are required. Just bring your favorite template and generate all the PDFs you need. Works on node and the browser. Anyone can easily create and modify templates using Designer (UI template editor). Templates have a JSON document representation, which makes theme easy to understand and easy to work with.
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    tableExport.jquery.plugin

    tableExport.jquery.plugin

    jQuery plugin to export a html table to JSON, XML, CSV, TSV, TXT, SQL

    jQuery plugin to export an html table to JSON, XML, CSV, TSV, TXT, SQL, Word, Excel, PNG, and PDF.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    OmniTools

    OmniTools

    Self-hosted collection of powerful web-based tools for everyday tasks

    ...It’s designed to replace the random assortment of “free online tools” people use for quick tasks, while avoiding ads, tracking, and the need to upload sensitive files to unknown servers. A key design choice is that file processing happens entirely on the client side, meaning your data stays in your browser instead of being sent to the backend. The tool catalog spans both technical and non-technical needs, including image, video, audio, PDF, text, date/time, math, and data format utilities like JSON/CSV/XML helpers. It’s also packaged for straightforward self-hosting, with a lightweight Docker image and simple run commands, so it can be deployed quickly on a homelab or internal network.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 12
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    OrgChart

    OrgChart

    It's a simple and direct organization chart plugin

    It's a simple and direct organization chart plugin. Anytime you want a tree-like chart, you can turn to OrgChart.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    Pysheeet

    Pysheeet

    Python Cheat Sheet

    Pysheeet is a community-driven collection of Python code snippets covering common patterns and tasks like sockets, file I/O, data structures, and more. Each snippet is concise and battle-tested, designed to save coding time and reduce boilerplate. With documentation hosted on Read the Docs and an active GitHub repo, it’s a go-to resource for Python developers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    RStudio Cheatsheets

    RStudio Cheatsheets

    Curated collection of official cheat sheets for data science tools

    The cheatsheets repository from RStudio is a curated collection of official cheat sheets for R, RStudio, the tidyverse, Shiny, and related data science tools. Each cheat sheet is a single (or double) page PDF that condenses important syntax, functions, workflows, and best practices into a visually organized format ideal for quick reference. The repository contains source files (R Markdown or LaTeX) that generate the cheat sheets, version history, and metadata (title, author, description) for each. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    toPDF

    Online service for PDF conversion (to PDF)

    A simple online service for PDF conversion. This project is a simple library and also a web application. It offers a REST service and a simple upload service for synchronous conversion. This library/application doesn't contain conversion libraries because it's a wrapper for existing tools. toPDF currently supports the open source tool PDF Creator (http://www.pdfforge.org) and the commercial solution, easy PDF, from BCL (http://www.pdfonline.com/easypdf/sdk/).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    FastReport Open Source

    FastReport Open Source

    Free Open Source Reporting tool for .NET

    Free Open Source Reporting tool for .NET Core/.NET Framework that helps your application generate document-like reports.
    Downloads: 78 This Week
    Last Update:
    See Project
  • 18
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    Ada Bar Codes

    Ada Bar Codes

    Bar Code (1D or 2D) generator in pure Ada

    The project Ada Bar Codes provides a framework for generating various types of bar codes (1D, or 2D, like QR codes) on different output formats and devices. Alire crate: https://alire.ada.dev/crates/bar_codes Mirror: https://github.com/zertovitch/ada-bar-codes
    Downloads: 8 This Week
    Last Update:
    See Project
  • 20
    html-pdf-chrome

    html-pdf-chrome

    HTML to PDF or image (jpeg, png, webp) converter via Chrome/Chromium

    HTML to PDF or image (jpeg, png, webp) converter via Chrome/Chromium. This library is NOT meant to accept untrusted user input. Doing so may have serious security risks such as Server-Side Request Forgery (SSRF). If you run into CORS issues, try using the --disable-web-security Chrome flag, either when you start Chrome externally, or in options.chromeFlags. This option should only be used if you fully trust the code you are executing during a print job. It is strongly recommended that you...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    mPDF

    mPDF

    PHP library generating PDF files from UTF-8 encoded HTML

    mPDF is a PHP library that generates PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files ‘on-the-fly’ from his website, handling different languages. It is slower than the original scripts e.g. HTML2FPDF and produces larger files when using Unicode fonts, but support for CSS styles etc. and has been much enhanced. Supports almost all languages including RTL (Arabic and Hebrew), and CJK (Chinese-Japanese-Korean). Nested block-level elements (e.g....
    Downloads: 77 This Week
    Last Update:
    See Project
  • 22
    Prawn

    Prawn

    Fast, Nimble PDF Writer for Ruby

    Prawn is a pure Ruby PDF generation library that provides a lot of great functionality while trying to remain simple and reasonably performant. Extensive text rendering support, including flowing text and limited inline formatting options. Comprehensive internationalization features, including full support for UTF-8 based fonts, right-to-left text rendering, fallback font support, and extension points for customizable text wrapping. Support for PDF outlines for document navigation. Low level...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    LimeReport

    LimeReport

    Report generator for Qt Framework

    ...Report designer included in the library allows to create fast and intuitive print form templates which can be saved in XML format and used to generate report pages. So formed pages could be send to preview, PDF file or printer. As a data source developer can use SQL database or data passed from application using QAbstractTableModel interface. Besides one can initialize variables which available as database request parameters. LimeReport goal is to provide your application with functionaly abundant and at the same time simple to use tool for a report generation to be used even by inexperienced in IT users.
    Leader badge
    Downloads: 16 This Week
    Last Update:
    See Project
  • 24
    Probability Cheatsheet

    Probability Cheatsheet

    A comprehensive 10-page probability cheatsheet

    ...It likely includes definitions of random variables, PMFs and PDFs, expectations, variance, common distributions (e.g. binomial, normal, Poisson, exponential), conditional probability, Bayes’ theorem, moment generating functions, and perhaps important inequalities (Markov, Chebyshev, Chernoff). The cheat sheet is intended as a quick reference for students, data scientists, statisticians, or anyone needing to recall core probability formulas without diving into textbooks. It may include visual diagrams (e.g. distributions’ shapes), tips or mnemonic notes, and examples of application (e.g. computing probabilities or expectations). Formats could include Markdown, PDF, or images for easy inclusion in study materials or slides.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PDF API HTML5 Web Apps

    PDF API HTML5 Web Apps

    Mini SDK JavaScript API library PDF web apps

    A condensed library designed to web modern applications, to quickly export your content html to pdf thanks the famous library in javascript: jsPDF. And a special thanks to the project canvg and html2canvas. Project documentation: http://ulmdevice.altervista.org/pdfapihtml5/#documentation ========== Also available service for Angular 7+: http://ulmdevice.altervista.org/pdfjsapi/ Mobile Applications: http://bit.ly/1MrlgKk Opera add-on: http://bit.ly/1kkMhTa
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next