84 projects for "pdf data mining" with 2 filters applied:

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    PDFCraft

    PDFCraft

    PDFCraft is a free, privacy-focused PDF toolkit

    PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite.
    Downloads: 38 This Week
    Last Update:
    See Project
  • 2
    Laravel Invoices

    Laravel Invoices

    Laravel package to generate PDF invoices from customizable parameters

    Laravel Invoices is a Laravel package for generating invoice PDF files from customizable data. It gives developers a simple interface for creating invoices that can be stored, downloaded, or streamed through configured filesystems. The package supports different templates and locales, making it useful for applications that serve customers in multiple regions. It is designed for business systems, SaaS products, admin panels, and client billing workflows that need invoice output without building the full PDF logic from scratch. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    tinypdf

    tinypdf

    Minimal PDF creation library

    tinypdf is a minimal, zero-dependency PDF generation library that focuses on the core “put content on a page” use case while intentionally skipping heavyweight features. It is designed to be extremely small and approachable, making it a good fit when you want to generate real PDFs in Node/TypeScript without pulling in a large toolkit. The library supports essential primitives like writing text, drawing basic shapes, and placing JPEG images, which covers common needs such as invoices,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    book-to-skill

    book-to-skill

    Turn any technical book PDF into a Claude Code skill

    book-to-skill is a Claude Code skill that turns technical books and documents into reusable AI reference skills. It extracts content from PDFs and EPUBs, then organizes the material so an assistant can study, reference, and apply it while working. The project is useful for transforming dense manuals, textbooks, internal documentation, or technical guides into practical agent-accessible knowledge. It includes an extraction script and a SKILL.md workflow that guides how the resulting content...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 5
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 32 This Week
    Last Update:
    See Project
  • 6
    PDF Split and Merge

    PDF Split and Merge

    Split and merge PDF files on any platform

    Split and merge PDF files with PDFsam, an easy-to-use desktop tool with graphical, command line and web interface.
    Leader badge
    Downloads: 272 This Week
    Last Update:
    See Project
  • 7
    Ada PDF Writer

    Ada PDF Writer

    A standalone, portable package for producing dynamically PDF documents

    PDF_Out is an Ada package for writing easily PDF files dynamically. Enables the automatic production of reports. Standalone and unconditionally portable code. No external resource is needed. More information on... http://apdf.sf.net Alire crate: https://alire.ada.dev/crates/apdf Mirror: https://github.com/zertovitch/ada-pdf-writer
    Leader badge
    Downloads: 25 This Week
    Last Update:
    See Project
  • 8

    toPDF

    Online service for PDF conversion (to PDF)

    A simple online service for PDF conversion. This project is a simple library and also a web application. It offers a REST service and a simple upload service for synchronous conversion. This library/application doesn't contain conversion libraries because it's a wrapper for existing tools. toPDF currently supports the open source tool PDF Creator (http://www.pdfforge.org) and the commercial solution, easy PDF, from BCL (http://www.pdfonline.com/easypdf/sdk/).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 4 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 10
    Chordii

    Chordii

    Easy lead sheets from text input

    ChordPro creates elegant, stafless lead sheets for musicians needing only chords and lyrics. It processes plain text input in ChordPro format and it is a rewrite of the old though still popular Chord/Chordii programs.
    Leader badge
    Downloads: 91 This Week
    Last Update:
    See Project
  • 11
    jPicEdt

    jPicEdt

    Another drawing editor for LaTeX with PSTricks & TikZ

    jPicEdt is an extensible internationalized vector-based drawing editor for LaTeX and related packages (TikZ, PsTricks,...), written in Java. It is also a library of reusable high-level graphic primitives.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 12
    Pdf4Tcl is a library for generating PDF documents from Tcl.
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13

    queXML

    XML Schema for questionnaires and PDF questionnaire generator

    queXML is a simple XML schema for designing questionnaires. Included are stylesheets to administer the questionnaire in PDF (paper), CASES and LimeSurvey. queXML is compatible with the DDI standard.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    RTextDoc

    RTextDoc

    An editor for structured documents

    RTextDoc is an editor for structured text documents such as LaTeX, AsciiDoc, DocBook. RTextDoc has proofreading capabilities: on-the-fly spelling, instant grammar checking and built-in free dictionaries. RTextDoc has syntax highlighting, bracket matching, folding, document structure browser for sections and labels, bookmarks, manager for LaTeX symbols, an editor for mathematical equations,integrated BibTeX database manager and several tools to convert LaTeX to HTML and back. AsciiDoc...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Laravel Report Generators

    Laravel Report Generators

    Rapidly Generate Simple Pdf, CSV, & Excel Report Package on Laravel

    Rapidly generate simple PDF reports on Laravel or CSV/Excel reports. This package provides simple PDF, csv & excel report generators to speed up your workflow. It also allows you to stream(), download(), or store() the report seamlessly.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16

    PoDoFo

    A PDF parsing, modification and creation library.

    The PoDoFo library is a free, portable C++ library. It can parse and modify existing PDF files and create new ones from scratch. It also includes several tools to work with PDF files. It features an unique approach which provides access to PDF documents via an object tree. Therefore, PDFs can be created and or manipulated using a simple tree structure. Development of PoDoFo has been moved to GitHub: https://github.com/podofo/podofo Please raise new issues in the GitHub project.
    Leader badge
    Downloads: 115 This Week
    Last Update:
    See Project
  • 17
    LaTeXDraw

    LaTeXDraw

    Vector drawing program for LaTeX using PSTricks

    LaTeXDraw is a graphical drawing editor for LaTeX. LaTeXDraw can be used to 1) generate PSTricks code; 2) directly create PDF or PS pictures.
    Leader badge
    Downloads: 71 This Week
    Last Update:
    See Project
  • 18
    PDF Guru

    PDF Guru

    Merge images and PDFs to a single PDF

    PDF Guru is a simple in use program for merging multiple images and PDF files into a single compact PDF file. It is capable of selecting specific PDF pages or range of pages, which lets you have more control on the output file. Be able to produce compacted, smaller sized files in any operating system. Its features makes it a great, must have, tool for everyone.
    Leader badge
    Downloads: 11 This Week
    Last Update:
    See Project
  • 19
    PdfJumbler
    A simple tool to rearrange/merge/delete pages from PDF files. The modular backend system uses either JPedal or JPod to display PDFs and iText or Apache PDFBox to save them. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Development of this project has moved to GitHub. Please check https://github.com/mgropp/pdfjumbler for current releases! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    Downloads: 12 This Week
    Last Update:
    See Project
  • 20
    TCPDF - PHP class for PDF

    TCPDF - PHP class for PDF

    PHP class for PDF

    TCPDF is a PHP class for generating PDF documents without requiring external extensions. TCPDF Supports UTF-8, Unicode, RTL languages, XHTML, Javascript, digital signatures, barcodes and much more. IMPORTANT: This version will be soon marked as deprecated and replaced by a new version currently under development: https://github.com/tecnickcom/tc-lib-pdf
    Leader badge
    Downloads: 149 This Week
    Last Update:
    See Project
  • 21
    Canorus

    Canorus

    Music score editor

    Canorus is a free cross-platform music score editor. It supports an unlimited number and length of staffs, polyphony, a MIDI playback of notes, chord markings, lyrics, import/export filters to formats like MIDI, MusicXML, ABC Music, MusiXTeX and LilyPond
    Downloads: 27 This Week
    Last Update:
    See Project
  • 22
    pdf-bot

    pdf-bot

    A Node queue API for generating PDFs using headless Chrome

    pdf-bot is a Node.js microservice designed to automate the generation of PDF documents from web pages using headless Chrome. The project provides a queue-based API that allows developers to submit URLs for PDF generation, which are then processed asynchronously by the service. Once a document is generated, the system can notify external applications through webhooks, enabling integration with other backend systems or automation pipelines. The service is particularly useful for generating...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...
    Leader badge
    Downloads: 114 This Week
    Last Update:
    See Project
  • 24
    Xena - Digital Preservation Software

    Xena - Digital Preservation Software

    Xena transforms files into open data formats

    Xena transforms files into open data formats for long-term digital preservation, encodes content in Base64 and wraps in XML metadata. Formats supported include MBOX, PST, MSG, DOC, XLS, PPT, RTF, PNG, XML, PDF, JPG, TIFF, PCX, WAV, MP3 and more. NO LONGER MAINTAINED, NO LONGER SUPPORTED
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    OpenEXI

    OpenEXI

    EXI implementations in Java and C#

    Open source .Net (C#) / Java implementation of the W3C Efficient XML Interchange (EXI) format specification. As a corollary to XML, EXI is an alternative, very efficient format that has all of the mechanics of XML, but is much more compact and is faster to exchange. - README (about Nagasena EXI implemenation) https://www.dropbox.com/s/adh83u9z1x1czv6/README.txt?dl=0 - Nagasena EXI grammar interchange format (PDF) https://www.dropbox.com/s/etrpuchaddplq2s/EXIGram.pdf?dl=0 -...
    Downloads: 15 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
Auth0 Logo