Showing 18 open source projects for "extraction"

View related business solutions
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    OpenDataLoader PDF

    OpenDataLoader PDF

    PDF Parser for AI-ready data. Automate PDF accessibility

    OpenDataLoader PDF is an open-source document processing system designed to convert complex PDF files into structured, AI-ready formats such as Markdown, JSON, and HTML while preserving layout, hierarchy, and semantic meaning. It focuses on enabling downstream use cases like retrieval-augmented generation (RAG), knowledge extraction, and document intelligence pipelines by maintaining accurate reading order and spatial metadata through bounding boxes. The tool combines deterministic parsing methods with an optional hybrid AI-powered mode that improves extraction quality for difficult layouts such as multi-column documents, scanned files, and scientific papers. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...
    Leader badge
    Downloads: 127 This Week
    Last Update:
    See Project
  • 4
    OpenSearchServer Extractor

    OpenSearchServer Extractor

    A RESTFul/JSON Web Service for text and metata extraction

    An open source RESTFul Web Service for text , meta-data extraction and analysis. oss-text-extractor supports various binary formats: Word processor (doc, docx, odt, rtf) Spreadsheet (xls, xlsx, ods) Presentation (ppt, pptx, odp) Publishing (pdf, pub) Web (rss, html/xhtml) Medias (audio, images) Others (vsd, text)
    Downloads: 0 This Week
    Last Update:
    See Project
  • Add Two Lines of Code. Get Full APM. Icon
    Add Two Lines of Code. Get Full APM.

    AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

    Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.
    Start Free
  • 5
    Simple general-purpose metadata extraction API with support for popular multimedia metadata formats such as EXIF and ID3.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Row-Bean

    Row-Bean

    CSV reader writer - bean mapping - easy bean extraction from CSV file

    Row-Bean is a CSV-Bean JAVA API . Row-Bean provides CSV reader an writer. More ever provides a mechanism to map csv file content to java beans and revers. For each use, a XML description must describe the wished mapping. Another possibility consists in use Annotations. Use under maven : <!-- row bean with annotations...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    The National Library of New Zealand's Metadata Extraction Tool automatically extracts preservation-related metadata from digital files, then output that metadata in XML formats. It can be used through a graphical user interface or command-line interface. Please take the latest code from 'https://github.com/DIA-NZ/Metadata-Extraction-Tool.git'. The code on source forge will not be updated henceforth as it is moved to github.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8

    Detexter

    Detexter is an app designed to extract text from PDF files.

    Detexter lets you extract text from multiple PDF files. Detexter uses the PDFBox library for its text extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    Large Text File converter

    Java Based Heavy-duty utilitity to process large delimited text files

    ...Another strength of this tool is in its configurability, it's design allows to generate as many output files as required from one input file, and at every row of input file validation, extraction, conversion can be applied. Use case Example: legacy system is to be replaced with new advanced system with different DB schema, and the data provided as 100GB size of delimited text data which is to be inserted in 10 different tables of new system DB after validation,date format conversion, rearrangements, and MD5 hashing implementation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    This project provides a toolkit and framework based on PDFBox for document analysis of PDF files and performing custom conversion tasks and is published under the Apache licence. A GUI is also included, and is published using the GPL licence.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    JBiblex
    Cross-platform explorer of ZIP archives with FB2 books.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Informarion extraction model for Java, supporting macros and named backreferences in regular expressions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Cairo (Complex Archive Ingest for Repository Objects) is a tool for processing digital archives prior to submitting them to archival storage for long-term preservation; among other features, this includes format identification and metadata extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Scan, the Semantic Content ANnotator, is a semantic pipeline that helps connecting information extraction tools to semantic database. UIMA-based, it allows easy plugin-writing: information extraction, ontology control, store in RDF Repositories.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Software for web pages data extraction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    jumbles (Java Unified Metadata Basic Library for Extracting and Storing) is a library that enables the extraction and storing of multimedia metadata. Currently wraps "jaudiotagger" (MP3 ID3 tags) and "metadata extractor" (EXIF, et al.).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    FOXY is a filtering web proxy. Originally designed to provide device-independent access to the World Wide Web, it may also be used for HTTP-filtering, extraction and reauthoring of existing web content or as security device against web based attacks.
    Leader badge
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    WebDeco implemens the Decoration Design Pattern for J2EE/JavaEE web applications. It does so by providing a Servlet Filter that wraps a very flexible and extensible framework for content extraction and decoration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB