Showing 103 open source projects for "pdf data mining"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • Your monitoring isn't a stack. It's a pile. Fix that. Icon
    Your monitoring isn't a stack. It's a pile. Fix that.

    Errors, performance, logs, uptime. One install, one invoice, one UI.

    Replace Datadog, New Relic, and Sentry without adding three more dashboards.
    Free 30 days.
  • 1
    PDF.js

    PDF.js

    A PDF Reader in JavaScript

    PDF.js is a web standards-based platform for parsing and rendering Portable Document Formats (PDFs). Open source and built with HTML5, this PDF viewer is supported by a great community and Mozilla Labs. PDF.js can be used on both modern and older browsers, and is built into version 19+ of Firefox.
    Downloads: 84 This Week
    Last Update:
    See Project
  • 2
    BentoPDF

    BentoPDF

    A Privacy First PDF Toolkit

    BentoPDF is a self-hosted, open-source PDF toolkit that provides a suite of local PDF manipulation features for users who want full control over their documents without relying on cloud PDF services. It offers functionality to merge, split, compress, rotate, and convert PDFs through an easy-to-deploy container or local installation, making it ideal for individuals and teams that handle large volumes of PDF files regularly.
    Downloads: 58 This Week
    Last Update:
    See Project
  • 3
    PDFCraft

    PDFCraft

    PDFCraft is a free, privacy-focused PDF toolkit

    PDFCraft is an extensible toolkit for creating, editing, and transforming PDF documents with both a graphical interface and a scripting API, making it useful for users ranging from casual editors to automated document processors. At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite.
    Downloads: 37 This Week
    Last Update:
    See Project
  • 4
    npm-pdfreader

    npm-pdfreader

    Parse text and tables from PDF files.

    npm-pdfreader is a Node.js library for reading text and parsing tables from PDF files. It supports tabular data with automatic column detection and rule-based parsing, making it useful for extracting structured data from PDFs. ​
    Downloads: 4 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 5
    jsPDF

    jsPDF

    HTML5 client solution for generating PDFs

    The leading HTML5 client solution for generating PDFs. Perfect for event tickets, reports, certificates, you name it! PDFs are ubiquitous across the web, with virtually every enterprise relying on them to share documents. We created jsPDF to solve a major problem with how pdf files were being generated. We decided to make it open-source to allow a community of developers to expand on it.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 6
    pdfmake

    pdfmake

    Client/server side PDF printing in pure JavaScript

    Print PDFs directly in the browser or delegate it to your NodeJS backend. Use the same document definition in both cases. Forget about manual x, y calculations. Declare document structure and let pdfmake do the rest. Use paragraphs, columns, lists, tables, canvas, etc. Declare your own styles, use custom fonts, build a DSL and extend the framework. Provides a set of options to disable font layout cache and to control when pages are flushed to the output file. Pdfmake is runnable in browser...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    DeckTape

    DeckTape

    PDF exporter for HTML presentations

    DeckTape is a high-quality PDF exporter for HTML presentation frameworks. DeckTape is built on top of Puppeteer which relies on Google Chrome for laying out and rendering Web pages and provides a headless Chrome instance scriptable with a JavaScript API. DeckTape currently supports the following presentation frameworks out of the box. DeckTape also provides a generic command that works by emulating the end-user interaction, allowing it to be used to convert presentations from virtually any...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    ...It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    carbone

    carbone

    Fast and simple report generator, from JSON to pdf, xslx, docx, odt

    Turn your JSON into PDF, DOCX, XLSX, PPTX, ODS and many more. Fast, Simple and Powerful report generator in any format PDF, DOCX, XLSX, ODT, PPTX, ODS, XML, CSV using templates and your JSON data as input.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 10
    WebViewer UI

    WebViewer UI

    WebViewer UI built in React

    WebViewer UI sits on top of WebViewer, a powerful JavaScript-based PDF Library that's part of the PDFTron PDF SDK. Built in React, WebViewer UI provides a slick out-of-the-box responsive UI that interacts with the core library to view, annotate and manipulate PDFs that can be embedded into any web project. This repo is specifically designed for any users interested in advanced customizations. With the source code access, it gives developers full control to customize & style the UI, build...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    tableExport.jquery.plugin

    tableExport.jquery.plugin

    jQuery plugin to export a html table to JSON, XML, CSV, TSV, TXT, SQL

    jQuery plugin to export an html table to JSON, XML, CSV, TSV, TXT, SQL, Word, Excel, PNG, and PDF.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Career-Ops

    Career-Ops

    AI-powered job search system built on Claude Code

    Career Ops is an open-source platform designed to help individuals manage their job search process with a structured, operations-style approach that treats career development like a pipeline. It provides a system for organizing job applications, tracking progress across different stages, and maintaining visibility into opportunities, much like a lightweight CRM tailored for job seekers. The project emphasizes clarity and accountability, enabling users to monitor applications, follow-ups, and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Fidus Writer

    Fidus Writer

    Fidus Writer is an online collaborative editor for academics

    Fidus Writer is an online collaborative editor especially made for academics who need to use citations and/or formulas. The editor focuses on the content rather than the layout, so that with the same text, you can later on publish it in multiple ways: On a website, as a printed book, or as an ebook. In each case, you can choose from a number of layouts that are adequate for the medium of choice.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 14
    OrgChart

    OrgChart

    It's a simple and direct organization chart plugin

    It's a simple and direct organization chart plugin. Anytime you want a tree-like chart, you can turn to OrgChart.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    Percollate

    Percollate

    A command-line tool to turn web pages into beautiful, readable PDF

    Percollate is a command-line tool that turns web pages into beautifully formatted PDF, EPUB, or HTML files. By default, percollate processes URLs in parallel. Use the --wait option to process them sequentially instead, with a pause between items. The delay is specified in seconds, and can be zero. By default, percollate bundles all web pages in a single file. Use the --individual flag to export each source to a separate file. Additional CSS styles you can pass from the command line to...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    Easy DataSet

    Easy DataSet

    A powerful tool for creating datasets for LLM fine-tuning

    Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure content into rich datasets tailored for downstream LLM training needs. The system includes automated question-generation capabilities, hierarchical label trees, and answer generation pipelines that use LLM APIs to produce coherent paired data with customizable templates. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 17
    Superalgos

    Superalgos

    Free, open-source crypto trading bot, automated bitcoin trading

    Free, open-source crypto trading bot, automated bitcoin/cryptocurrency trading software, algorithmic trading bots. Visually design your crypto trading bot, leveraging an integrated charting system, data-mining, backtesting, paper trading, and multi-server crypto bot deployments. Superalgos is not just another open-source project. We are an open and welcoming community nurtured and incentivized with the project's native Superalgos (SA) Token, building an open trading intelligence network. You will notice the difference as soon as you join the Telegram Community Group or the new Discord Server! ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 18
    Element

    Element

    A glossy Matrix collaboration client for the web

    Element, formerly known as Vector and Riot, is a glossy Matrix collaboration client built using the Matrix React SDK. It offers teams, friends and organizations a secure, all in one chat app that is protected from pesky ads and data mining methods. All communications are done through the open global Matrix network, secured with end-to-end encryption. Element gives you all the services you need from a chat app: group chat, video calls, file sharing and more-- all done securely and in total privacy. Element has three different tiers of support for different environments, the most supported being the latest versions of Chrome, Firefox, and Safari on desktop OSes.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    Collabora Online

    Collabora Online

    Collabora Online is a collaborative online office suite

    Collabora Online is a powerful online office suite that you can integrate into your own infrastructure or access via one of our trusted hosting Partners. Your digital sovereignty is our priority. We provide you with all the tools to keep your data secure, without compromising on features. Collabora Online’s text document editor provides a true WYSIWYG editing experience, making visualizing your document layout incredibly easy. Open any document, add comments and track changes from anywhere,...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 20
    JupyterLab

    JupyterLab

    JupyterLab computational environment

    ...Documents and activities integrate with each other, enabling new workflows for interactive computing. JupyterLab also offers a unified model for viewing and handling data formats. JupyterLab understands many file formats (images, CSV, JSON, Markdown, PDF, Vega, Vega-Lite, etc.) and can also display rich kernel output in these formats. See File and Output Formats for more information. To navigate the user interface, JupyterLab offers customizable keyboard shortcuts and the ability to use key maps from vim, emacs, and Sublime Text in the text editor.
    Downloads: 93 This Week
    Last Update:
    See Project
  • 21
    canvas-editor

    canvas-editor

    Canvas-based WYSIWYG rich text editor with advanced layout tools

    ...It is designed to provide a WYSIWYG editing experience similar to word processors, enabling precise control over layout, rendering, and document structure. canvas-editor supports a wide range of formatting and document features, including text styling, tables, images, and embedded elements, all managed through a structured data model. Its architecture is modular, allowing developers to extend functionality through plugins, custom commands, and event hooks. It includes support for page-based layouts with headers, footers, pagination, and print-ready output, including PDF generation. It also provides interactive components such as form controls and context menus, making it suitable for building complex document editing systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    DocsGPT

    DocsGPT

    Private AI platform for agents, enterprise search and RAG pipelines

    DocsGPT is an open-source AI platform for deploying private RAG pipelines, AI agents, and enterprise search on your own infrastructure. Connect any data source (PDFs, DOCX, CSV, Excel, HTML, audio, GitHub, databases, URLs) and get accurate, hallucination-free answers with source citations. Choose your LLM: OpenAI, Anthropic, Google Gemini, or local models. Works with Qdrant, MongoDB, and Elasticsearch and more. Deploy via Docker or Kubernetes with full data sovereignty. Build...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Une interface pour la saisie des comptes rendus d'activité mensuels, avec génération d'un PDF résultat.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Jmol

    Jmol

    An interactive viewer for three-dimensional chemical structures.

    Over 1,000,000 page views per month. Jmol/JSmol is a molecular viewer for 3D chemical structures that runs in four independent modes: an HTML5-only web application utilizing jQuery, a Java applet, a stand-alone Java program (Jmol.jar), and a "headless" server-side component (JmolData.jar). Jmol can read many file types, including PDB, CIF, SDF, MOL, PyMOL PSE files, and Spartan files, as well as output from Gaussian, GAMESS, MOPAC, VASP, CRYSTAL, CASTEP, QuantumEspresso, VMD, and many other...
    Leader badge
    Downloads: 586 This Week
    Last Update:
    See Project
  • 25
    MyBox

    MyBox

    Easy Tools of PDF, Image, File, Network, Data, and Medias

    javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next