Showing 1224 open source projects for "extract"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Epublifier

    Epublifier

    Converts some webnovels to epub format

    A tool to convert website-based books or lists of pages to ePub format to read on your eReader/Kindle/etc.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Zotero

    Zotero

    Tool to help you collect, organize, annotate, cite, and share research

    Zotero is a powerful, free, open-source research management application designed to help students, academics, and professionals collect, organize, annotate, cite, and share research sources and materials for papers, projects, or books. It can save web pages, PDFs, books, articles, and more with metadata, automatically extract bibliographic information, and organize items into collections and tag systems, while supporting notes and annotations directly alongside references. Zotero’s interface integrates with word processors like Microsoft Word and LibreOffice to generate formatted citations and bibliographies in many styles, and it can sync libraries across devices or share them with collaborators. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 3
    Kaniko

    Kaniko

    Build Container Images In Kubernetes

    kaniko is a tool to build container images from a Dockerfile, inside a container or Kubernetes cluster. kaniko doesn't depend on a Docker daemon and executes each command within a Dockerfile completely in userspace. This enables building container images in environments that can't easily or securely run a Docker daemon, such as a standard Kubernetes cluster.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Gitingest

    Gitingest

    Create prompt-friendly codebase digests from any Git repository URL

    ...The generated output is optimized for prompt usage, helping AI models understand codebases more effectively without requiring manual file aggregation. In addition to producing the code digest, Gitingest also calculates statistics about the extracted content such as repository structure, total size of the extract, and token count. Gitingest can be used as a command line utility or integrated directly into Python applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Geziyor

    Geziyor

    Blazing fast Go framework for web crawling and data scraping tasks

    Geziyor is a high-performance web crawling and web scraping framework built for the Go programming language. It is designed to help developers crawl websites and extract structured information from web pages efficiently. It focuses on speed and scalability, allowing large numbers of requests to be processed concurrently. Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows. It provides a flexible architecture where developers define parsing functions that process responses and extract the desired data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Sparrow

    Sparrow

    Structured data extraction and instruction calling with ML, LLM

    Sparrow is an open-source platform designed to extract structured information from documents, images, and other unstructured data sources using machine learning and large language models. The system focuses on transforming complex documents such as invoices, receipts, forms, and scanned pages into structured formats like JSON that can be processed by downstream applications. It combines several components, including OCR pipelines, vision-language models, and LLM-based reasoning modules to identify and extract meaningful data fields from heterogeneous document layouts. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Interface Design

    Interface Design

    Design engineering for Claude Code

    ...The plugin prompts users to confirm a design direction early in the process and then applies those principles consistently — from button sizes to spacing scales and color tokens — so work stays aligned with the established system. It also offers commands to inspect the current design system status, audit inconsistencies, and extract patterns back into a reusable format, making it a live feedback loop for quality UI work.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    monolith

    monolith

    CLI tool for saving complete web pages as a single HTML file

    A data hoarder’s dream come true, bundle any web page into a single HTML file. You can finally replace that gazillion of open tabs with a gazillion of .html files stored somewhere on your precious little drive. Unlike the conventional “Save page as”, monolith not only saves the target document, it embeds CSS, image, and JavaScript assets all at once, producing a single HTML5 document that is a joy to store and share. If compared to saving websites with wget -mpk, this tool embeds all assets...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    Tailslayer

    Tailslayer

    Library for reducing tail latency in RAM reads

    Tailslayer is a cybersecurity and forensic analysis tool designed to extract and analyze artifacts from Tails OS environments. It focuses on uncovering traces of activity in systems that are designed to be privacy-preserving and ephemeral. The tool helps investigators identify residual data such as logs, configurations, or usage traces that may persist despite Tails’ security features. It is particularly relevant for digital forensics and research into privacy systems.
    Downloads: 8 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    Jbuilder

    Jbuilder

    Generate JSON objects with a Builder-style DSL

    Jbuilder gives you a simple DSL for declaring JSON structures that beats manipulating giant hash structures. This is particularly helpful when the generation process is fraught with conditionals and loops. You can either use Jbuilder stand-alone or directly as an ActionView template language. When required in Rails, you can create views à la show.json.jbuilder (the json is already yielded). Fragment caching is supported, it uses Rails.cache and works like caching in HTML templates. If your...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    DeSmuME

    DeSmuME

    DeSmuME is a Nintendo DS emulator

    In this version we have added support for high-resolution 3D rendering. Try the new “GPU Scaling Factor” feature to increase the 3D resolution beyond the native resolution of 256×192 pixels. Also, the Cocoa frontend sees continued radical enhancements and while the Windows frontend sees some new incremental enhancements. DeSmuME is a very CPU demanding app. While many users will see DeSmuME as a toy (and use it as such), it is actually a very sophisticated piece of software with lots of...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 12
    Cloud Commander

    Cloud Commander

    Cloud Commander file manager for the web with console and editor

    ...Adapts to screen size. 3 built-in editors with support of syntax highlighting: Dword, Edward and Deepword. Console with support of the default OS command line. Written in JavaScript/Node.js. Built-in archives pack: zip and tar.gz. Built-in archives extract: zip, tar, gz, bz2, .tar.gz and .tar.bz2 (with help of inly). Cloud Commander could be used as middleware for node.js applications based on socket.io and express. The docker images are provided for multiple architectures and types. Config would be read from home directory, hosts root file system would be mount to /mnt/fs, 8000 port would be exposed to hosts port.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 13
    Mobile Verification Toolkit

    Mobile Verification Toolkit

    Helps with conducting forensics of mobile devices

    Mobile Verification Toolkit (MVT) is a collection of utilities to simplify and automate the process of gathering forensic traces helpful to identify a potential compromise of Android and iOS devices. It has been developed and released by the Amnesty International Security Lab in July 2021 in the context of the Pegasus project along with a technical forensic methodology and forensic evidence. MVT is a forensic research tool intended for technologists and investigators. Using it requires...
    Downloads: 33 This Week
    Last Update:
    See Project
  • 14
    Addax

    Addax

    Addax is a versatile open-source ETL tool

    Addax is a data integration and ETL (Extract, Transform, Load) tool designed for high-performance data migration tasks. It simplifies the process of moving data between different systems and formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Portable Executable Parser

    Portable Executable Parser

    lightweight Go package to parse, analyze and extract metadata

    Saferwall PE is a lightweight Go package for parsing, analyzing, and extracting metadata from Portable Executable (PE) binaries. Designed with malware analysis in mind, it is robust against malformed PE files and provides detailed insights into executable structures.​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    GoWall

    GoWall

    A tool to convert a Wallpaper's color scheme / palette, image to pixel

    Gowall is a versatile command-line tool for processing images, initially created to convert wallpapers to match specific color schemes. It has evolved to include features like image-to-pixel-art conversion, color palette extraction, background removal, and more, making it a powerful utility for image manipulation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Ksoup

    Ksoup

    Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML

    Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML, extracting HTML tags, attributes, and text, and encoding and decoding HTML entities. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    PapersGPT

    PapersGPT

    A powerful Zotero AI and MCP plugin with ChatGPT, Gemini 3.1, Claude

    PapersGPT is an AI-powered plugin that integrates directly into Zotero to transform how researchers interact with academic papers and literature collections. It enables users to chat with individual PDFs or entire collections, allowing them to extract insights, generate summaries, and explore connections between documents without leaving the Zotero environment. The plugin supports a wide range of state-of-the-art language models, including GPT, Claude, Gemini, and open-source alternatives, giving users flexibility in choosing performance, cost, and privacy trade-offs. One of its most powerful features is its ability to process large volumes of academic content quickly, enabling tasks such as literature reviews, theoretical analysis, and research synthesis to be completed significantly faster. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 20
    Nyxt

    Nyxt

    The hacker's power-browser

    Out of the box Nyxt ships with tens of features that allow you to quickly analyze, navigate, and extract information from the Internet. Plus, Nyxt is fully hackable- all of its source code can be introspected, modified, and tweaked to your exact specification. Navigate large documents with ease. Utilize the power of running commands against multiple objects to avoid repeating yourself. You can select and close all buffers that match the string "ele".
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    cognee

    cognee

    Deterministic LLMs Outputs for AI Applications and AI Agents

    ...Any kind of data works; unstructured text or raw media files, PDFs, tables, presentations, JSON files, and so many more. Add small or large files, or many files at once. We map out a knowledge graph from all the facts and relationships we extract from your data. Then, we establish graph topology and connect related knowledge clusters, enabling the LLM to "understand" the data.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 22
    PyMuPDF

    PyMuPDF

    Python bindings for MuPDF's rendering library.

    MuPDF is a lightweight PDF, XPS, and E-book viewer. MuPDF consists of a software library, command line tools, and viewers for various platforms. The renderer in MuPDF is tailored for high-quality anti-aliased graphics. It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on the screen. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB,...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 23
    SSZipArchive

    SSZipArchive

    ZipArchive is a simple utility class for zipping and unzipping files

    ZipArchive is a popular Objective-C and Swift library for handling ZIP file compression and extraction in iOS and macOS applications. It provides a simple API to create, extract, and manage compressed files.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Verba

    Verba

    Retrieval Augmented Generation (RAG) chatbot powered by Weaviate

    Welcome to Verba: The Golden RAGtriever, a community-driven open-source application designed to offer an end-to-end, streamlined, and user-friendly interface for Retrieval-Augmented Generation (RAG) out of the box. In just a few easy steps, explore your datasets and extract insights with ease, either locally with Ollama and Huggingface or through LLM providers such as Anthrophic, Cohere, and OpenAI. This project is built with and for the community, please be aware that it might not be maintained with the same urgency as other Weaviate production applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    JSONPath Plus

    JSONPath Plus

    A fork of JSONPath

    Analyse, transform, and selectively extract data from JSON documents (and JavaScript objects). JSON path-plus expands on the original specification to add some additional operators and makes explicit some behaviors the original did not spell out. Try the browser demo or Runkit (Node).
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB