Search Results for "pdf data mining" - Page 5

Showing 893 open source projects for "pdf data mining"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    Hagenberg Thesis Document Collection

    Hagenberg Thesis Document Collection

    Hagenberg LaTeX Thesis Template

    This is a collection of modern LaTeX classes, style files, and example documents for authoring Bachelor, Master, or Diploma theses and related academic manuscripts in English and German. Pre-configured English and German documents are available, easy to use even for LaTeX beginners, and compatible with LaTeX distributions for Windows, Mac OS, and Linux. The document classes are immediately usable and convenient to customize. The main document, HgbThesisTutorialEN or HgbThesisTutorialDE,...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Apache Sedona

    Apache Sedona

    Cluster computing framework for processing large-scale geospatial data

    ...According to our benchmark and third-party research papers, Sedona has 50% less peak memory consumption than other Spark-based geospatial data systems for large-scale in-memory query processing. Sedona offers Scala, Java, Spatial SQL, Python, and R APIs and integrates them into underlying system kernels with care. You can simply create spatial analytics and data mining applications and run them in any cloud environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Dify

    Dify

    One API for plugins and datasets, one interface for prompt engineering

    Dify is an easy-to-use LLMOps platform designed to empower more people to create sustainable, AI-native applications. With visual orchestration for various application types, Dify offers out-of-the-box, ready-to-use applications that can also serve as Backend-as-a-Service APIs. Unify your development process with one API for plugins and datasets integration, and streamline your operations using a single interface for prompt engineering, visual analytics, and continuous improvement....
    Downloads: 36 This Week
    Last Update:
    See Project
  • 4
    Logseq

    Logseq

    A privacy-first, open-source platform for knowledge management

    Logseq is a privacy-first, open-source knowledge base that works on top of local plain-text Markdown and Org-mode files. Use it to write, organize and share your thoughts, keep your to-do list, and build your own digital garden. Logseq is a platform for knowledge management and collaboration. It focuses on privacy, longevity, and user control. The server will never store or analyze your private notes. Your data are plain text files and we currently support both Markdown and Emacs Org-mode...
    Downloads: 10 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Superalgos

    Superalgos

    Free, open-source crypto trading bot, automated bitcoin trading

    Free, open-source crypto trading bot, automated bitcoin/cryptocurrency trading software, algorithmic trading bots. Visually design your crypto trading bot, leveraging an integrated charting system, data-mining, backtesting, paper trading, and multi-server crypto bot deployments. Superalgos is not just another open-source project. We are an open and welcoming community nurtured and incentivized with the project's native Superalgos (SA) Token, building an open trading intelligence network. You will notice the difference as soon as you join the Telegram Community Group or the new Discord Server! ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    EZ Bookkeeping

    EZ Bookkeeping

    A lightweight, self-hosted personal finance app

    Ez Bookkeeping is an open-source personal finance and bookkeeping web application designed to help individuals and small businesses track income, expenses, accounts, and budgets with simplicity and clarity. It provides a clean, modern interface where users can enter transactions, categorize expenses, and visualize financial data through dashboards, charts, and monthly summaries so that users can better understand their cash flow and spending patterns. The system supports multiple account...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    LLMStack

    LLMStack

    No-code multi-agent framework to build LLM Agents, workflows

    LLMStack is a no-code platform for building generative AI agents, workflows and chatbots, connecting them to your data and business processes. Build tailor-made generative AI agents, applications and chatbots that cater to your unique needs by chaining multiple LLMs. Seamlessly integrate your own data, internal tools and GPT-powered models without any coding experience using LLMStack's no-code builder. Trigger your AI chains from Slack or Discord. Deploy to the cloud or on-premise.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    JupyterLab

    JupyterLab

    JupyterLab computational environment

    ...Documents and activities integrate with each other, enabling new workflows for interactive computing. JupyterLab also offers a unified model for viewing and handling data formats. JupyterLab understands many file formats (images, CSV, JSON, Markdown, PDF, Vega, Vega-Lite, etc.) and can also display rich kernel output in these formats. See File and Output Formats for more information. To navigate the user interface, JupyterLab offers customizable keyboard shortcuts and the ability to use key maps from vim, emacs, and Sublime Text in the text editor.
    Downloads: 112 This Week
    Last Update:
    See Project
  • 9
    Huxtable

    Huxtable

    An R package to create styled tables in multiple output formats

    Huxtable is an R package to create LaTeX and HTML tables, with a friendly, modern interface. Features include control over text styling, number format, background color, borders, padding, and alignment. Cells can span multiple rows and/or columns. Tables can be manipulated with standard R subsetting or dplyr functions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 10
    Ada PDF Writer

    Ada PDF Writer

    A standalone, portable package for producing dynamically PDF documents

    PDF_Out is an Ada package for writing easily PDF files dynamically. Enables the automatic production of reports. Standalone and unconditionally portable code. No external resource is needed. More information on... http://apdf.sf.net Alire crate: https://alire.ada.dev/crates/apdf Mirror: https://github.com/zertovitch/ada-pdf-writer
    Leader badge
    Downloads: 25 This Week
    Last Update:
    See Project
  • 11
    Symfony Panther

    Symfony Panther

    A browser testing and web crawling library for PHP and Symfony

    Symfony Panther is a browser testing and web scraping tool that allows developers to interact with websites programmatically. It uses headless Chrome or Firefox to automate browser tasks, making it suitable for end-to-end testing and data extraction. Panther integrates well with Symfony and PHPUnit, allowing developers to write comprehensive tests for web applications.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    canvas-editor

    canvas-editor

    Canvas-based WYSIWYG rich text editor with advanced layout tools

    ...It is designed to provide a WYSIWYG editing experience similar to word processors, enabling precise control over layout, rendering, and document structure. canvas-editor supports a wide range of formatting and document features, including text styling, tables, images, and embedded elements, all managed through a structured data model. Its architecture is modular, allowing developers to extend functionality through plugins, custom commands, and event hooks. It includes support for page-based layouts with headers, footers, pagination, and print-ready output, including PDF generation. It also provides interactive components such as form controls and context menus, making it suitable for building complex document editing systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    NAPS2 - Not Another PDF Scanner

    NAPS2 - Not Another PDF Scanner

    Scan documents to PDF and other file types, as simply as possible.

    Visit NAPS2's home page at www.naps2.com. NAPS2 is a document scanning application with a focus on simplicity and ease of use. Scan your documents from WIA- and TWAIN-compatible scanners, organize the pages as you like, and save them as PDF, TIFF, JPEG, PNG, and other file formats. Available on Windows, Mac, and Linux. NAPS2 is currently available in over 40 different languages. Want to see NAPS2 in your preferred language? Help translate! See the wiki for more details.
    Leader badge
    Downloads: 770 This Week
    Last Update:
    See Project
  • 15
    RStudio Cheatsheets

    RStudio Cheatsheets

    Curated collection of official cheat sheets for data science tools

    The cheatsheets repository from RStudio is a curated collection of official cheat sheets for R, RStudio, the tidyverse, Shiny, and related data science tools. Each cheat sheet is a single (or double) page PDF that condenses important syntax, functions, workflows, and best practices into a visually organized format ideal for quick reference. The repository contains source files (R Markdown or LaTeX) that generate the cheat sheets, version history, and metadata (title, author, description) for each. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    efactuur-pdf-nl

    PDF generation for Dutch UBL and SETU invoices

    The EfactuurNL2PDF project provides the following functionality: - PDF generation for UBL or SETU Invoice documents. - Schematron validation stylesheets - Genericode validation stylesheets The following HR-XML-NL and UBL-NL message versions are currently supported in this project : - NLCIUS (si-ubl-2.0.1) - UBL Invoice 1.9 - UBL Invoice 1.8 - UBL Invoice 1.7 - UBL Invoice 1.6.3 - UBL Invoice 1.6.2 - UBL Invoice 1.1 - SETU Invoice 2.0 - SETU Invoice 1.8.1 - SETU Invoice...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    JPdfBookmarks
    This software allows you to create and edit bookmarks on existing pdf files.
    Leader badge
    Downloads: 151 This Week
    Last Update:
    See Project
  • 18
    QAnything

    QAnything

    Question and Answer based on Anything

    QAnything is a local knowledge-base question-answering system designed to let users ask questions over many kinds of files and databases. It supports offline installation, making it useful for organizations that need private document analysis without sending data to external services. Users can upload local files and receive fast, reliable answers based on the indexed content. The system supports formats such as PDF, Word, PowerPoint, Excel, Markdown, email, text, images, CSV, and web links. Its retrieval process uses a two-stage vector and reranking approach to maintain answer quality as the knowledge base grows. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    TEXminer

    TEXminer

    Text Mining Classification for Texts in ASCII, Unicode and PDF Format.

    TEXminer uses generic Text Mining Methods to analyze Unicode Files as plain Text or PDF. The Text Database can be saved in XML where the orginal Text, the Sentence and Word Lists and additional Parameters (e.g. Abbreviations) are stored. TEXminer allows Language Detection by Letter Frequency Analysis, finding important Words by Cooccurrence Analysis, Determination of Central Expressions, Thematic Text Classification (also Semantic Groups) Fingerprint Comparison and Word Frequency. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    QPDF

    QPDF

    PDF transformation/manipulation program + library

    QPDF is a C++ library and set of programs that inspect and manipulate the structure of PDF files. It can encrypt and linearize files, expose the internals of a PDF file, and do many other operations useful to end users and PDF developers.
    Leader badge
    Downloads: 996 This Week
    Last Update:
    See Project
  • 21
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 22
    PdfBooklet
    PdfBooklet is a Python Gtk application which allows to make books or booklets from existing pdf files. It can also adjust margins, rotate, scale, merge files or extract pages.
    Leader badge
    Downloads: 212 This Week
    Last Update:
    See Project
  • 23
    Browserless

    Browserless

    Deploy headless browsers in Docker

    ...It lets developers connect existing Puppeteer and Playwright code to remote browser sessions over WebSocket, which helps move heavy browser work away from local machines or application servers. The project also provides REST APIs for common automation tasks such as screenshots, PDF generation, scraping, crawling, and content export. Browserless is useful for teams that need scalable browser execution for testing, data collection, rendering, or AI-agent browsing workflows. Its deployment model supports self-hosting, private infrastructure, queues, concurrency controls, and enterprise-oriented configuration. The project’s main value is turning browser automation into a managed service layer that can be reused across applications and workflows.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    5ire

    5ire

    5ire is a cross-platform desktop AI assistant, MCP client

    5ire is a sleek, cross‑platform desktop AI assistant and MCP client that connects to major service providers, supports a local knowledge base and tool integration via MCP servers, enabling robust RAG and assistant features. These components are required as they constitute the runtime environment for the MCP Server. If you don't anticipate using the tools feature immediately, you may choose to skip this installation step and complete it later when the need arises. MCP is an open protocol that...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 25
    Extractous

    Extractous

    Fast and efficient unstructured data extraction

    Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo