Showing 50 open source projects for "extraction"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 1
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 2
    FModel

    FModel

    Unreal Engine Archives Explorer

    FModel is a freeware application designed to explore Unreal Engine games archives, allowing users to delve into the assets and structures of games developed with Unreal Engine.
    Downloads: 89 This Week
    Last Update:
    See Project
  • 3
    tsfresh

    tsfresh

    Automatic extraction of relevant features from time series

    tsfresh is a python package. It automatically calculates a large number of time series characteristics, the so called features. tsfresh is used to to extract characteristics from time series. Without tsfresh, you would have to calculate all characteristics by hand. With tsfresh this process is automated and all your features can be calculated automatically. Further tsfresh is compatible with pythons pandas and scikit-learn APIs, two important packages for Data Science endeavours in python....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines that chain together transforms, filters, and exporters, enabling automation of tedious data preparation steps and accelerating insights with minimal code. The system places a premium on extensibility, allowing contributors to add new extractors or analysis modules tailored to specific industries or datasets. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    PDFPatcher

    PDFPatcher

    A versatile toolkit for PDF manipulation

    PDFPatcher (aka “PDF补丁丁”) is a versatile toolkit for PDF manipulation—editing document metadata, bookmarks, page layout, content restrictions, rotation, compression, merging/splitting, image extraction, and more, all within an intuitive interface. Merge/split PDFs or images, preserve or add bookmarks, and set page dimensions. Batch style/color/target changes, regex/XPath search/replace, mid‑page positioning. Modify PDF metadata, page numbers, links, initial view mode, and remove open actions.
    Downloads: 53 This Week
    Last Update:
    See Project
  • 6
    BlockArrays.jl

    BlockArrays.jl

    BlockArrays for Julia

    ...The type BlockArray stores each block contiguously while the type PseudoBlockArray stores the full matrix contiguously. This means that BlockArray supports fast noncopying extraction and insertion of blocks while PseudoBlockArray supports fast access to the full matrix to use in for example a linear solver.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    JS Analyzer

    JS Analyzer

    Burp Suite extension for JavaScript static analysis

    JS Analyzer is a powerful static analysis tool implemented as a Burp Suite extension that helps security researchers and web developers automatically uncover important artifacts in JavaScript files during web application testing. It parses JavaScript responses intercepted by Burp Suite and intelligently extracts API endpoints, full URLs (including cloud storage links), secrets like API keys or tokens, and email addresses while filtering out noise from irrelevant code patterns. The extension...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    npm-pdfreader

    npm-pdfreader

    Parse text and tables from PDF files.

    npm-pdfreader is a Node.js library for reading text and parsing tables from PDF files. It supports tabular data with automatic column detection and rule-based parsing, making it useful for extracting structured data from PDFs. ​
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    EEGLAB

    EEGLAB

    EEGLAB is an open source signal processing environment

    EEGLAB is an open source, MATLAB-based interactive environment for analyzing electrophysiological signals such as EEG and MEG. It incorporates powerful tools for data import, preprocessing, independent component analysis (ICA), time-frequency analysis, artifact rejection, and visualization—all within a GUI framework that also supports scripting and plugin extensions. EEGLAB is an open source signal processing environment for electrophysiological signals running on Matlab and Octave (command...
    Downloads: 11 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Documind

    Documind

    Open-source platform for extracting structured data from documents

    Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    FinGPT

    FinGPT

    Open-Source Financial Large Language Models

    ...The platform typically includes tools for fine-tuning, context engineering, and prompt templating, enabling users to build specialized assistants for tasks like sentiment analysis, earnings summary generation, risk profiling, trading signal interpretation, and document extraction from financial reports.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    OpenMed

    OpenMed

    Open source healthcare AI

    OpenMed is an open-source healthcare AI and medical NLP toolkit designed to turn clinical text into structured insights using transformer-based models and production-oriented interfaces. Its core purpose is to provide specialized medical entity extraction, PII detection and de-identification, assertion-aware analysis, and related healthcare text processing capabilities without locking users into a proprietary platform. The project includes a curated registry of more than a dozen medical NER models focused on areas such as diseases, drugs, anatomy, genes, and protected health information, and it is built to support both research and deployment scenarios. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    ClimateTools.jl

    ClimateTools.jl

    Climate science package for Julia

    Climate analysis tools in Julia. ClimateTools.jl is a collection of commonly-used tools in Climate science. Basics of climate field analysis are covered, with some forays into exploratory techniques associated with climate scenario design. The package is aimed to ease the typical steps of analysis of climate models outputs and gridded datasets (support for weather stations is a work-in-progress). Climate indices and bias correction functions are coded to leverage the use of multiple threads....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Obsidian Visual Skills Pack

    Obsidian Visual Skills Pack

    Generate Canvas, Excalidraw, and Mermaid diagrams from text

    LLM-TLDR is a Python-based tool designed to dramatically reduce the amount of code a large language model needs to read by extracting the essential structure and context from a codebase and presenting only the most relevant parts to the model. Traditional approaches often dump entire files into a model’s context, which quickly exceeds token limits; LLM-TLDR instead indexes project structure, traces dependencies, and summarizes code in a way that preserves semantic relevance while shrinking...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    tika-python

    tika-python

    Python binding to the Apache Tika™ REST services

    A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    LabPlot

    LabPlot

    Data Visualization and Analysis

    LabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.
    Downloads: 38 This Week
    Last Update:
    See Project
  • 17
    OpenKM Document Management - DMS

    OpenKM Document Management - DMS

    Document Management System and Content Management System

    OpenKM Community Edition is a free Document Management System (DMS) that helps businesses control the production, storage, management and distribution of electronic documents, boosting effectiveness and productivity. It integrates document management, collaboration and advanced search into one easy-to-use solution, including administration tools for user roles, access control, security levels, activity logs and automation setup. With OpenKM Community Edition you can: Collect information...
    Leader badge
    Downloads: 514 This Week
    Last Update:
    See Project
  • 18
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to expand its capabilities, focusing on versatile data extraction, platform support, and seamless integration with various systems. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    Datapipe

    Datapipe

    Real-time, incremental ETL library for ML with record-level depend

    Datapipe is a real-time, incremental ETL library for Python with record-level dependency tracking. Datapipe is designed to streamline the creation of data processing pipelines. It excels in scenarios where data is continuously changing, requiring pipelines to adapt and process only the modified data efficiently. This library tracks dependencies for each record in the pipeline, ensuring minimal and efficient data processing.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 20

    Hyperspectral data analysis in R

    Handling and basic analysis of hyperspectral data in R

    The hsdar package contains classes and functions to manage, analyse and simulate hyperspectral data. These might be either spectrometer measurements or hyperspectral images through the interface of rgdal.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    M. Vezelis Draw

    M. Vezelis Draw

    Project productivity & cost estimations tool which creates Gantt Chart

    That's project development productivity and cost estimations tool which reflects it in the form of multiple custom diagrams, including the Gantt Chart. The tool provides succinct project metrics related information, and on demand information under each diagram. It allows a user to calculate Function Points, and estimate cost based on COCOMO Basic and Intermediate models.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Excel to Graphviz

    Excel to Graphviz

    Free Excel tool to easily create Graphviz data visualizations.

    The Excel to Graphviz Relationship Visualizer spreadsheet transforms your Excel data into professional Graphviz diagrams. Enter simple "A is related to B" rows to instantly generate polished relationship graphs using the Graphviz DOT language. Ideal for data analysis, network visualization, and IT architecture. Free, open-source, MIT-licensed. Customize the look of nodes, edges, and clusters with the Style Designer. Build a CSS-like gallery of reusable styles that lets you apply...
    Leader badge
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    MBR Bulk WP Detector

    MBR Bulk WP Detector

    A free WP plugin that lets you check unlimited URLs

    MBR Bulk WP Detector is a free WordPress plugin that lets you check unlimited URLs right from your own dashboard. No subscriptions, no URL limits, and your data stays completely private on your server. What Can You Do With It? The basics are simple: Paste a list of URLs (or upload a CSV file), click a button, and boom—you’ve got a clear breakdown of which sites are running WordPress and which aren’t. But it gets better… Turn on Deep Scan mode, and you’ll also discover what...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    SentimentAnalysis-Rick&Morty

    SentimentAnalysis-Rick&Morty

    Rick & Morty Sentiment Analysis - End-of-Degree Project - UNIR

    ...Text mining is a fascinating application of data analysis that extracts relevant information from related writings in different linguistic contexts. And therefore, in natural language processing, sentiment analysis and classification stands out as a key application supported by text mining. Through the extraction of information from textual data, it becomes possible to identify and comprehend the sentiments and emotions conveyed. In this end-of-degree work, we analyze and classify the dialogue of characters in an English-language television series as "Rick and Morty" using Python. The objective is to identify and categorize the feelings and emotions expressed in the text, comparing the human perception of the characters' personalities with the results obtained using natural language processing techniques.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PDFLayoutTextStripper

    PDFLayoutTextStripper

    Converts a pdf file into a text file while keeping the layout

    Converts a PDF file into a text file while keeping the layout of the original PDF. Useful to extract the content from a table or a form in a PDF file. PDFLayoutTextStripper is a subclass of PDFTextStripper class (from the Apache PDFBox library).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next