Showing 1768 open source projects for "extract"

View related business solutions
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    LangChain Extract

    LangChain Extract

    Did you say you like data?

    LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Extract TOTP/HOTP secrets

    Extract TOTP/HOTP secrets

    Extract one time password (OTP) secrets from QR codes

    The Python script extract_otp_secrets.py extracts one-time password (OTP) secrets from QR codes exported by two-factor authentication (2FA) apps such as "Google Authenticator".
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4

    lessmsi

    Tool to view and extract contents of a Windows Installer (.msi) file

    lessmsi (formerly known as Less Msiérables) is a free utility with a graphical user interface and a command line interface used for viewing and extracting the contents of a Windows Installer (.msi) file.
    Downloads: 35 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 5
    PDFsam

    PDFsam

    PDFsam, a desktop application to split, merge, mix, rotate PDF files

    PDFsam Basic is our free and open-source desktop application to split, merge, extract pages, rotate and mix PDF files. PDFsam Visual is a powerful tool to visually compose PDF files, reorder pages, delete pages, split, merge, rotate, encrypt, decrypt, extract text, convert to grayscale, crop PDF files. PDFsam Basic is written using JavaFX. Since version 4 it is released as a self-contained application and bundles a jlinked JDK while version 3 requires a Java Runtime Environment 8 with JavaFx installed in order to run.
    Downloads: 141 This Week
    Last Update:
    See Project
  • 6
    EMV NFC Paycard Enrollment

    EMV NFC Paycard Enrollment

    A Java library used to read and extract data from NFC EMV credit cards

    Java library used to read and extract public data from NFC EMV credit cards.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 7

    ldif-extract

    Extrect selected entries from LDIF files like grep

    ldif-extract is a small 'grep' like tool to extract and convert data from LDIF files. It could be used standalone or also in a pipe together with other tools like ldapsearch.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    PdfPig

    PdfPig

    Read and extract text and other content from PDFs in C#

    This project allows users to read and extract text and other content from PDF files. In addition the library can be used to create simple PDF documents containing text and geometrical shapes.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    PHP Font Lib

    PHP Font Lib

    A library to read, parse, export and make subsets of different fonts

    This library can be used to read TrueType, OpenType (with TrueType glyphs), WOFF font files. Extract basic info (name, style, etc). Extract advanced info (horizontal metrics, glyph names, glyph shapes, etc). Make an Adobe Font Metrics (AFM) file from a font. You can find a demo GUI. This project was initiated by the need to read font files in the DOMPDF project.
    Downloads: 4 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    Autopsy

    Autopsy

    Autopsy® is a digital forensics platform and graphical interface

    Autopsy® is a digital forensics platform and graphical interface to The Sleuth Kit® and other digital forensics tools. It can be used by law enforcement, military, and corporate examiners to investigate what happened on a computer. You can even use it to recover photos from your camera's memory card. Autopsy was designed to be intuitive out of the box. Installation is easy and wizards guide you through every step. All results are found in a single tree. See the intuitive page for more...
    Downloads: 108 This Week
    Last Update:
    See Project
  • 11
    Toutatis

    Toutatis

    Extract public Instagram account information from usernames

    Toutatis is an open source command-line tool designed to extract publicly available information from Instagram accounts. It helps users gather various data points from a target profile by querying Instagram using a username or account ID. The tool can retrieve details such as profile metadata, follower counts, biography information, and other publicly accessible account attributes. In addition to basic profile data, Toutatis can also reveal contact details that may be publicly exposed, including email addresses and phone numbers associated with the account. ...
    Downloads: 36 This Week
    Last Update:
    See Project
  • 12
    Documind

    Documind

    Open-source platform for extracting structured data from documents

    Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    AIO-Switch-Updater

    AIO-Switch-Updater

    Update your CFW, cheat codes, firmwares from your Nintendo Switch

    ...AIO-Switch-Updater uses a custom RCM payload to finalise the install as it can't be performed while HOS is running. Download and update Hekate, as well as a selection of RCM payloads. Download and extract daily-updated cheat code. The program will only extract cheat codes for the games you own. By default, this homebrew will overwrite the existing cheats. If you have your own cheat files that you'd like to keep as is, you can turn off cheat updates for specific titles.
    Downloads: 49 This Week
    Last Update:
    See Project
  • 14
    Allure Report

    Allure Report

    Flexible, lightweight multi-language test reporting tool

    Allure Report is a flexible, lightweight multi-language test reporting tool. It provides clear graphical reports and allows everyone involved in the development process to extract the maximum of information from the everyday testing process. Allure Report is a flexible multi-language test report tool to show you a detailed representation of what has been tested end extract max from the everyday execution of tests. Allure Report is capable to build unified reports for dozens of testing tools across eleven programming languages on several CI/CD systems.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 15
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    FFsubsync

    FFsubsync

    Automagically synchronize subtitles with video

    ...In this case, you can use the correctly synchronized srt file directly as a reference for synchronization, instead of using the video as the reference. ffsubsync uses the file extension to decide whether to perform voice activity detection on the audio or to directly extract speech from an srt file. ffsubsync usually finishes in 20 to 30 seconds, depending on the length of the video.
    Downloads: 42 This Week
    Last Update:
    See Project
  • 17
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    refactoring.nvim

    refactoring.nvim

    The Refactoring library based off the Refactoring book

    refactoring.nvim is a Neovim plugin developed to bring powerful automated code refactoring capabilities to one of the most popular text editors among programmers, giving developers a suite of refactoring operations that streamline repetitive restructuring tasks inside the editor. Built around an intuitive set of commands and a Lua API, the plugin allows users to extract and inline variables or functions, pull blocks of code into new files, and modify code structure without leaving the comfort of Neovim’s modal interface. It integrates with built-in Neovim selection modes and can work with third-party tools like Telescope to present refactoring options quickly, enabling rapid transformation of code patterns. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Image Toolbox

    Image Toolbox

    Image Toolbox is an powerful picture editor, which can crop

    Image Toolbox is a powerful picture editor, which can crop, apply filters, add some drawings, erase background, edit EXIF, or even create a PDF file.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 20
    Volatility

    Volatility

    An advanced memory forensics framework

    Volatility is a widely used open-source framework for analyzing memory captures (RAM dumps) from Windows, Linux, and macOS systems. It enables investigators and malware analysts to extract process lists, network connections, DLLs, strings, artifacts, and more. Volatility supports many plugins for detecting hidden processes, malware, rootkits, and event tracing. It’s essential in digital forensics and incident response workflows.
    Downloads: 144 This Week
    Last Update:
    See Project
  • 21
    DocTR

    DocTR

    Library for OCR-related tasks powered by Deep Learning

    DocTR provides an easy and powerful way to extract valuable information from your documents. Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 22
    OpenAPI.NET

    OpenAPI.NET

    Object model for OpenAPI documents in .NET

    The OpenAPI.NET SDK contains a useful object model for OpenAPI documents in .NET along with common serializers to extract raw OpenAPI JSON and YAML documents from the model. The OpenAPI.NET project holds the base object model for representing OpenAPI documents as .NET objects. Some developers have found the need to write processors that convert other data formats into this OpenAPI.NET object model. We'd like to curate that list of processors in this section of the readme.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    Beets

    Beets

    Open-source music library management system

    Beets catalogs your music collection with a variety of tools for manipulating and accessing music.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    ...ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Kor

    Kor

    LLM

    This is a half-baked prototype that “helps” you extract structured data from text using LLMs. Specify the schema of what should be extracted and provide some examples. Kor will generate a prompt, send it to the specified LLM and parse out the output. You might even get results back.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB