Search Results for "pdf data mining" - Page 6

Showing 893 open source projects for "pdf data mining"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 1
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models. Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Khoj

    Khoj

    An AI personal assistant for your digital brain

    Get more done with your open-source AI personal assistant. Khoj is a desktop application to search and chat with your notes, documents, and images. It is an offline-first, open-source AI personal assistant that is accessible from Emacs, Obsidian or your Web browser. Khoj is a thinking tool that is transparent, fun, and easy to engage with. You can build faster and better by using Khoj to search and reason across all your data sources. Khoj learns from your notes and documents to function as...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    DocsGPT

    DocsGPT

    Private AI platform for agents, enterprise search and RAG pipelines

    DocsGPT is an open-source AI platform for deploying private RAG pipelines, AI agents, and enterprise search on your own infrastructure. Connect any data source (PDFs, DOCX, CSV, Excel, HTML, audio, GitHub, databases, URLs) and get accurate, hallucination-free answers with source citations. Choose your LLM: OpenAI, Anthropic, Google Gemini, or local models. Works with Qdrant, MongoDB, and Elasticsearch and more. Deploy via Docker or Kubernetes with full data sovereignty. Build...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    ...Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP, GraphQL protocols with TLS. Intuitive design pattern for high-performance microservices. Seamless Docker container integration: sharing, exploring, sandboxing, versioning and dependency control via Jina Hub. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 5
    deepdoctection

    deepdoctection

    A Repo For Document AI

    DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Perf Book

    Perf Book

    The book "Performance Analysis and Tuning on Modern CPU"

    This project is a practical guide to performance analysis and tuning on modern CPUs, bridging microarchitecture details with hands-on profiling. It explains how caches, TLBs, prefetchers, branch predictors, and out-of-order execution influence real program speed, then connects those concepts to concrete optimization strategies. Readers learn how to design trustworthy benchmarks, avoid measurement traps (warmup, turbo, frequency scaling), and interpret hardware performance counters. The book...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    psychmeta

    psychmeta

    Psychometric meta-analysis toolkit

    ...Documentation for psychmeta’s functions is available in the package’s PDF manual. Includes tools for converting effect sizes, computing sporadic artifact corrections, reshaping meta-analytic databases, computing multivariate corrections for range variation, and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    unMinable

    unMinable

    unMinable is a command-line based cryptocurrency mining tool

    unMinable is a command-line based cryptocurrency mining tool designed for efficient and user-friendly Bitcoin mining. It provides real-time hardware detection, mining process control, balance management, and automated withdrawal functionality. The software is designed to interact with Firebase to fetch and store user balances, withdrawals, and user-related data securely. The terminal allows users to start and monitor their mining progress, view their balances, and withdraw their mined funds when they reach the minimum threshold of 0.001 BTC. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 9
    pdfcrack is a command line, password recovery tool for PDF-files.
    Leader badge
    Downloads: 479 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Kimai

    Kimai

    Kimai is a web-based multi-user time-tracking application

    Kimai is an open-source time-tracking solution. It tracks work time and prints out a summary of your activities on demand. Yearly, monthly, daily, by the customer, by the project. Its simplicity is its strength. Due to Kimai’s browser-based interface, it runs cross-platform, even on your mobile device. With Kimai, the boring process of feeding Excel spreadsheets with your working hours is not only simplified, it also offers dozens of other exciting features that you don't even know you're...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    DeepSeek Prover V2

    DeepSeek Prover V2

    Advancing Formal Mathematical Reasoning via Reinforcement Learning

    ...It also includes a PDF of the paper or project overview and sample formalization datasets. Because theorem proving is a cutting-edge area in LLM research, Prover-V2 is positioned as a pushing-forward effort in formal reasoning for LLMs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Admidio

    Admidio

    Admidio is a free open source user management system for websites

    Admidio is a free open source user management system for websites of organizations and groups. The system has a flexible role model so that it’s possible to reflect the structure and permissions of your organization. You can create an individual profile for your members by adding or removing profile fields. Additional to these functions the system contains several modules like member lists, event manager, messages, photo album or a documents & files area. Admidio is a free online membership...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    pstoedit

    pstoedit

    converts PostScript or PDF files to other vector graphics formats

    pstoedit is a tool converting PostScript and PDF files into various other formats suported by different drawing editors. As a prerequesite it needs GhostScript to be installed (binary installation is sufficient).
    Downloads: 115 This Week
    Last Update:
    See Project
  • 14

    litePDF

    a library to create/modify PDF documents using HDC/TCanvas

    litePDF is a Windows library (DLL), which allows creating new and editing of existing PDF documents with simple API. Page content is drawn with standard GDI functions through a device context (HDC or TCanvas, in case of Delphi or C++ Builder).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15

    toPDF

    Online service for PDF conversion (to PDF)

    A simple online service for PDF conversion. This project is a simple library and also a web application. It offers a REST service and a simple upload service for synchronous conversion. This library/application doesn't contain conversion libraries because it's a wrapper for existing tools. toPDF currently supports the open source tool PDF Creator (http://www.pdfforge.org) and the commercial solution, easy PDF, from BCL (http://www.pdfonline.com/easypdf/sdk/).
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16

    Candid PDF Table

    CandidPDFTable – Deterministic TCPDF Table Builder

    CandidPDFTable (Candid PDF Table Builder) is a deterministic, colspan-aware table builder designed specifically for TCPDF. It provides a clean and predictable API to construct HTML tables for TCPDF::writeHTML() using explicit, cell-owned borders and late-stage layout computation. The library is built for programmatic table generation where precise control over rows, columns, colspans, borders, and serial numbering is essential. Building complex tables directly in TCPDF becomes difficult...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Decaleon

    Decaleon

    Multilingual Esperanto Translator, Word Dictionary, Vocabulary Trainer

    ...The Sourceforge Project TEXminer uses the same XML Database for Text Mining. Cooccurrences in development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    workerPdf

    workerPdf

    WorkerPDF is GUI for GhostScript created for PDF conversion

    WorkerPDF uses GhostScript https://www.ghostscript.com/. WorkerPDF created for PDF conversion. Program features: - Compress pdf documents; - Combine pdf; - Moving pdf pages; - Rotating pdf pages; - Creating pdf from images; - Convert pdf to images. - Encrypt, decrypt pdf WorkerPDF использует GhostScript https://www.ghostscript.com/. WorkerPDF создан для преобразования PDF. Возможности программы: - Сжатие pdf документов; - Объединение pdf; - Перестановка страниц...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20

    realwatermark

    A Python application to add watermarks (text or image) to PDF files

    A Python application to add watermarks (text or image) to PDF files, converts them into image and back to PDF with options for OCR and compression.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    dxf2gcode

    dxf2gcode

    DXF2GCODE: converting 2D dxf drawings to CNC machine compatible G-Code

    DXF2GCODE is a tool for converting 2D (dxf, pdf, ps) drawings to CNC machine compatible GCode. Windows, Linux, and Mac support by using python scripting language.
    Leader badge
    Downloads: 332 This Week
    Last Update:
    See Project
  • 22
    GLE - Graphics Layout Engine
    GLE is a graphics scripting language designed for creating publication quality graphs, plots, diagrams, figures, and slides. Text can be formatted with LaTeX/TeX markup. Its output formats include EPS, PS, PDF, JPEG, and PNG. GLE can operate as either a command line or GUI application.
    Downloads: 76 This Week
    Last Update:
    See Project
  • 23
    PII-Blackout

    PII-Blackout

    100% offline, AI-powered PDF redaction

    ...PII Blackout automatically scans, detects, and blackouts sensitive data points across your documents in one click. Absolute, Irreversible Security (Image-Level Blackout) Unlike standard PDF editors that merely place a black shape over editable text (which can easily be copied or uncovered), PII Blackout flattens and bakes the redaction directly into the image surface of the document. The covered data is permanently destroyed and mathematically impossible to recover. 100% Offline & Local Processing
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    GeoDMA

    GeoDMA

    Geographic feature extraction and data mining

    GeoDMA is a plugin for TerraView software, used for geographical data mining. With a single image, the user can perform segmentation, attributes extraction, normalization and classification.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Invoiso

    Invoiso

    Free Invoice & Billing Software for Windows, Linux & Mac

    Free offline invoice software for Windows, Linux and Mac. Create professional PDF invoices, manage clients and products - no internet, no subscription, no account needed. Invoiso is a free, open-source desktop invoicing app for small businesses, shops, and freelancers on Windows and Linux. All data stays on your device — no internet required, ever. Default Login: Username: admin Password: admin You will be prompted to change the password on first login.
    Leader badge
    Downloads: 63 This Week
    Last Update:
    See Project
Auth0 Logo