Showing 10 open source projects for "pdf data mining"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • 1
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    tidytext

    tidytext

    Text mining using tidy tools

    tidytext brings tidy data principles to text mining by converting text into a tidy data frame format. It provides tools for tokenization, sentiment analysis, n‑gram creation, and term‑document matrices, enabling interoperability with dplyr, ggplot2, and other tidyverse workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    deepdoctection

    deepdoctection

    A Repo For Document AI

    DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Common Resource Grep - crgrep

    Common Resource Grep - crgrep

    Common Resource Grep

    CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. Here you will find binary downloads and discussion (https://sourceforge.net/p/crgrep/discussion/) . ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    Parsr

    Parsr

    Transforms PDF, Documents and Images into Enriched Structured Data

    Parsr is an open-source document parsing tool that converts PDFs, scanned images, and other structured documents into structured, machine-readable data formats.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6

    VecText

    Converting text to a structured representation

    ...In the command-line interface mode, all options need to be specified using the command line parameters. This way of non-interactive communication enables incorporating the application into a more complicated data mining process integrating several software packages or performing multiple conversions in a batch.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    TEXT2DATA

    TEXT2DATA

    Text Analytics Platform

    Bring Text Analytics Platform that uses NLP (Natural Language Processing) and Machine Learning to your work environment. Extract essential information from your text documents and let Artificial Intelligence save your time. Get detailed and agile reports on your unstructured data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ...sectionId=341. This data set is created in order to perform text mining operations on Turkish and make experimental results re-producable. The TTC-3600 data set has 4 different forms in terms of pre-processing: 1. Original: No pre-processing step is applied. 2. FPS-5: The first five characters of terms are selected as stem and stop-words elimination is performed. 3.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    Persica-A new Persian corpus for NLP

    This project presents a new corpus for NEWS text analysis in Persian

    Lack of multi-application text corpus despite of the surging text data is a serious bottleneck in the text mining and natural language processing especially in Persian language. This project presents a new corpus for NEWS articles analysis in Persian called Persica. NEWS analysis includes NEWS classification, topic discovery and classification, category classification and many more procedures. Dealing with NEWS has special requirements and first of all a valid and reliable corpus to perform the experiments on them. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    Facilitates data mining/natural language processing experiments to be executed on weblogs, such as classification, clustering and rating. As part of these experiments, it is possible to apply Latent Semantic Analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo