26 projects for "pdf data mining" with 2 filters applied:

  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 1
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 2
    tableExport.jquery.plugin

    tableExport.jquery.plugin

    jQuery plugin to export a html table to JSON, XML, CSV, TSV, TXT, SQL

    jQuery plugin to export an html table to JSON, XML, CSV, TSV, TXT, SQL, Word, Excel, PNG, and PDF.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    OmniTools

    OmniTools

    Self-hosted collection of powerful web-based tools for everyday tasks

    ...It’s designed to replace the random assortment of “free online tools” people use for quick tasks, while avoiding ads, tracking, and the need to upload sensitive files to unknown servers. A key design choice is that file processing happens entirely on the client side, meaning your data stays in your browser instead of being sent to the backend. The tool catalog spans both technical and non-technical needs, including image, video, audio, PDF, text, date/time, math, and data format utilities like JSON/CSV/XML helpers. It’s also packaged for straightforward self-hosting, with a lightweight Docker image and simple run commands, so it can be deployed quickly on a homelab or internal network.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 5
    OrgChart

    OrgChart

    It's a simple and direct organization chart plugin

    It's a simple and direct organization chart plugin. Anytime you want a tree-like chart, you can turn to OrgChart.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    Pysheeet

    Pysheeet

    Python Cheat Sheet

    Pysheeet is a community-driven collection of Python code snippets covering common patterns and tasks like sockets, file I/O, data structures, and more. Each snippet is concise and battle-tested, designed to save coding time and reduce boilerplate. With documentation hosted on Read the Docs and an active GitHub repo, it’s a go-to resource for Python developers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    RStudio Cheatsheets

    RStudio Cheatsheets

    Curated collection of official cheat sheets for data science tools

    The cheatsheets repository from RStudio is a curated collection of official cheat sheets for R, RStudio, the tidyverse, Shiny, and related data science tools. Each cheat sheet is a single (or double) page PDF that condenses important syntax, functions, workflows, and best practices into a visually organized format ideal for quick reference. The repository contains source files (R Markdown or LaTeX) that generate the cheat sheets, version history, and metadata (title, author, description) for each. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    toPDF

    Online service for PDF conversion (to PDF)

    A simple online service for PDF conversion. This project is a simple library and also a web application. It offers a REST service and a simple upload service for synchronous conversion. This library/application doesn't contain conversion libraries because it's a wrapper for existing tools. toPDF currently supports the open source tool PDF Creator (http://www.pdfforge.org) and the commercial solution, easy PDF, from BCL (http://www.pdfonline.com/easypdf/sdk/).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9

    jpeg2pdf

    Create PDF from JPEG scans and photos

    Cross-platform command-line tool for creation of PDF documents from scans/photos of pages in JPEG (.jpg) format and the lightest weight ANSI C library to put multiple JPEG files into one PDF file. You can add handwritten comments to PDF scans (over original images) with xournal: http://xournal.sourceforge.net/ It supports graphics tablets and saves comments to PDFs as vector data.
    Leader badge
    Downloads: 28 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Probability Cheatsheet

    Probability Cheatsheet

    A comprehensive 10-page probability cheatsheet

    ...It likely includes definitions of random variables, PMFs and PDFs, expectations, variance, common distributions (e.g. binomial, normal, Poisson, exponential), conditional probability, Bayes’ theorem, moment generating functions, and perhaps important inequalities (Markov, Chebyshev, Chernoff). The cheat sheet is intended as a quick reference for students, data scientists, statisticians, or anyone needing to recall core probability formulas without diving into textbooks. It may include visual diagrams (e.g. distributions’ shapes), tips or mnemonic notes, and examples of application (e.g. computing probabilities or expectations). Formats could include Markdown, PDF, or images for easy inclusion in study materials or slides.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...
    Leader badge
    Downloads: 115 This Week
    Last Update:
    See Project
  • 12
    Incanter

    Incanter

    Clojure-based, R-like statistical computing and graphics environment

    Incanter is a Clojure-based, R-like statistical computing and visualization library running on the JVM. It integrates core numerical libraries like Parallel Colt and JFreeChart to deliver data manipulation, modeling, statistical tests, and charting in a REPL-friendly environment. Start by visiting the Incanter website for an overview, check out the documentation page for a listing of HOW-TOs and examples, and then download either an Incanter executable or a pre-built version of the latest...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    PDFReporter

    PDFReporter

    Generating documents and reports, offline enabled and reliable.

    The library is a fork of the popular open source Jasper Reports and supports the common features provided by Jasper Reports, but offline and for mobile apps. The PDFReporter library supports iOS, Java and Android library. For your document and report design you use the PDFReporter Studio where you can visualize your data. If you want to use the library commercially please visit our official webpage.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    JFreeChart
    JFreeChart is a free (LGPL) chart library for the Java(tm) platform. It supports bar charts, pie charts, line charts, time series charts, scatter plots, histograms, simple Gantt charts, Pareto charts, bubble plots, dials, thermometers and more. *** JFreeChart has moved to GitHub: https://github.com/jfree/jfreechart ***
    Leader badge
    Downloads: 284 This Week
    Last Update:
    See Project
  • 15

    libVMR

    VMR - machine learning library

    libVMR is a class library written in Java which implements code generator for group method of data handling - GMDH. The library is intended for users, with machine learning skills. libVMR provides an effective framework for the research and development of data mining and predictive analytics. libVMR is based on the most popular neural network model with a higher generalization ability from kernel tricks - vector machine by Reshetov (VMR).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    JChart2D

    JChart2D

    jchart2d is a real-time charting library written in java.

    JChart2D is a easy to use component for displaying two- dimensional traces in a coordinate system written in Java. It supports real-time (animated) charting, custom trace rendering, Multithreading, viewports, automatic scaling and labels. Former UI controls (right click context menu, file menu) have been ported to the subproject jchart2d-uimenu (https://sourceforge.net/projects/jchart2d-uimenu.jchart2d.p/) for the benefit of having no dependencies to 3rd party libraries.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    CMIS Input plugin for Pentaho

    CMIS Input plugin for Pentaho

    Allows querying Content Management Systems that use the CMIS.

    ...All this is possible within the Pentaho Suite, the Open Source Business Intelligence platform, which is useful to the extraction and analysis of structured and semi-structured data. With this goal (the extraction and analysis of data) has been designed and developed the CMIS Input plugin for Pentaho Data Integration (Kettle) that allows querying Content Management Systems that use the CMIS interoperability standard. The data, once extracted, can be stored and analyzed and perhaps presented in customized reports be published in various formats for the end user (PDF, Excel, etc..).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    PdfPageCounter

    C++ code to count the number pages in a given PDF file.

    This C++ library contains the 'PdfPageCount' class that performs the single task of finding the number of pages in a given PDF document. While the PdfPageCount class is very simple to use, the contained code is complex because the page count can be hidden in any number of places, quite often within compressed data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Graphane is a solution to generate and deliver enterprise documents (PDF, ODT, RTF, HTML). Templates documents are designed with OpenOffice Writer. Any application being able to export data in XML format can submit these data to the Graphane Server.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20

    RepEdit

    Project moved to https://sourceforge.net/projects/qsqlmon/

    Report library + visual editor for Qt based applications. Project moved to https://sourceforge.net/projects/qsqlmon/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ** Guys I have built a much more powerful Fully Featured CMS system at: https://github.com/MacdonaldRobinson/FlexDotnetCMS Macs CMS is a Flat File ( XML and SQLite ) based AJAX Content Management System. It focuses mainly on the Edit In Place editing concept. It comes with a built in blog with moderation support, user manager section, roles manager section, SEO / SEF URL
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Python module and command line utility that analyzes XML output from the program pdftohtml in order to extract tables from PDF files. Outputs CSV.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    This is a Java port of the original FPDF free PDF generation library for PHP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    dompdf - the PHP 5 HTML to PDF converter. dompdf is a (mostly) CSS compliant HTML rendering engine written in PHP. It supports external stylesheets, inline style tags, and the style attributes of individual HTML elements. Requires PHP 5.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    The Umber project provides simple, flexible, "earthy" Java tool libraries for developers. The tools supplement common tasks like XML handling, data processing, and PDF generation, but without the complex and arcane APIs of most modern implementations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next