332 projects for "extract" with 1 filter applied:

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    LangChain Extract

    LangChain Extract

    Did you say you like data?

    LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3

    ldif-extract

    Extrect selected entries from LDIF files like grep

    ldif-extract is a small 'grep' like tool to extract and convert data from LDIF files. It could be used standalone or also in a pipe together with other tools like ldapsearch.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    EMV NFC Paycard Enrollment

    EMV NFC Paycard Enrollment

    A Java library used to read and extract data from NFC EMV credit cards

    Java library used to read and extract public data from NFC EMV credit cards.
    Downloads: 29 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    Toutatis

    Toutatis

    Extract public Instagram account information from usernames

    Toutatis is an open source command-line tool designed to extract publicly available information from Instagram accounts. It helps users gather various data points from a target profile by querying Instagram using a username or account ID. The tool can retrieve details such as profile metadata, follower counts, biography information, and other publicly accessible account attributes. In addition to basic profile data, Toutatis can also reveal contact details that may be publicly exposed, including email addresses and phone numbers associated with the account. ...
    Downloads: 40 This Week
    Last Update:
    See Project
  • 6
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    LLM Scraper

    LLM Scraper

    Extract structured data from webpages using LLM-powered scraping

    LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Volatility

    Volatility

    An advanced memory forensics framework

    Volatility is a widely used open-source framework for analyzing memory captures (RAM dumps) from Windows, Linux, and macOS systems. It enables investigators and malware analysts to extract process lists, network connections, DLLs, strings, artifacts, and more. Volatility supports many plugins for detecting hidden processes, malware, rootkits, and event tracing. It’s essential in digital forensics and incident response workflows.
    Downloads: 150 This Week
    Last Update:
    See Project
  • 9
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 1 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    refactoring.nvim

    refactoring.nvim

    The Refactoring library based off the Refactoring book

    refactoring.nvim is a Neovim plugin developed to bring powerful automated code refactoring capabilities to one of the most popular text editors among programmers, giving developers a suite of refactoring operations that streamline repetitive restructuring tasks inside the editor. Built around an intuitive set of commands and a Lua API, the plugin allows users to extract and inline variables or functions, pull blocks of code into new files, and modify code structure without leaving the comfort of Neovim’s modal interface. It integrates with built-in Neovim selection modes and can work with third-party tools like Telescope to present refactoring options quickly, enabling rapid transformation of code patterns. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    designlang

    designlang

    Extract any website's complete design system with one command

    ...The system includes accessibility analysis features, such as WCAG compliance checks and CSS health audits, helping developers improve usability and standards compliance. It can be used via CLI or browser extension, making it flexible for different workflows. Overall, design-extract automates the process of reverse-engineering design systems, significantly accelerating frontend development.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    DiscordChatExporter

    DiscordChatExporter

    Saves Discord chat logs to a file

    ...With both command-line and GUI usage patterns available through the ecosystem, it supports automation as well as manual workflows. Overall, DiscordChatExporter provides a reliable way to extract and preserve Discord communications outside the platform.
    Downloads: 74 This Week
    Last Update:
    See Project
  • 13
    Scribe.js

    Scribe.js

    JavaScript OCR and text extraction for images and PDFs

    Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. In addition to simple text extraction, Scribe.js supports writing or injecting a high-quality invisible text layer back into PDFs, effectively making them searchable and improving usability for indexing or accessibility. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Interface Design

    Interface Design

    Design engineering for Claude Code

    ...The plugin prompts users to confirm a design direction early in the process and then applies those principles consistently — from button sizes to spacing scales and color tokens — so work stays aligned with the established system. It also offers commands to inspect the current design system status, audit inconsistencies, and extract patterns back into a reusable format, making it a live feedback loop for quality UI work.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Chandra

    Chandra

    OCR model for complex documents with layout-aware structured outputs

    Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    docext

    docext

    An on-premises, OCR-free unstructured data extraction

    ...This allows the system to detect and extract structured elements such as tables, signatures, key fields, and layout information while maintaining semantic understanding of the document content. The toolkit can also convert complex documents into structured markdown representations that preserve formatting and contextual relationships.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Gitingest

    Gitingest

    Create prompt-friendly codebase digests from any Git repository URL

    ...The generated output is optimized for prompt usage, helping AI models understand codebases more effectively without requiring manual file aggregation. In addition to producing the code digest, Gitingest also calculates statistics about the extracted content such as repository structure, total size of the extract, and token count. Gitingest can be used as a command line utility or integrated directly into Python applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Geziyor

    Geziyor

    Blazing fast Go framework for web crawling and data scraping tasks

    Geziyor is a high-performance web crawling and web scraping framework built for the Go programming language. It is designed to help developers crawl websites and extract structured information from web pages efficiently. It focuses on speed and scalability, allowing large numbers of requests to be processed concurrently. Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows. It provides a flexible architecture where developers define parsing functions that process responses and extract the desired data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Ksoup

    Ksoup

    Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML

    Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML, extracting HTML tags, attributes, and text, and encoding and decoding HTML entities. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    AI-Crawler

    AI-Crawler

    Crawl a website starting from a URL, find relevant pages

    ...Unlike traditional web scrapers that rely on static selectors and manual scripting, it uses AI to dynamically identify and prioritize pages based on user intent, making it more flexible and resilient to changes in website structure. Users can define their data requirements in plain English, and the system will interpret those instructions to crawl a domain and extract structured data. The tool supports output formats such as JSON and Markdown, and it can generate or accept schemas to ensure that extracted data is structured according to application needs. It is designed as a low-code solution, reducing the complexity of building and maintaining custom scraping pipelines.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    Article Extractor

    Article Extractor

    To extract main article from given URL with Node.js

    A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Scriberr

    Scriberr

    Self-hosted AI audio transcription

    ...The application includes a polished user interface that simplifies the management of recordings, transcripts, and annotations, making it suitable for both casual users and professionals handling large volumes of audio. Beyond transcription, Scriberr also integrates features such as summarization, tagging, and interaction with language models, allowing users to extract insights from conversations or meetings efficiently.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    Weibo Crawler

    Weibo Crawler

    Python crawler for collecting and downloading Sina Weibo user data

    weibo-crawler is a Python-based data collection tool designed to retrieve information from Sina Weibo user accounts. It automates the process of gathering posts, user profile details, and engagement metrics from one or more target accounts. weibo-crawler can extract comprehensive information about users, including profile attributes such as nickname, follower count, following count, and account metadata. It also captures detailed data about each post, including the content, publishing time, topics, mentions, likes, reposts, and comments. In addition to textual data, the project can download original media from posts, such as images, videos, and Live Photo content. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    PDFCraft

    PDFCraft

    PDFCraft is a free, privacy-focused PDF toolkit

    ...At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite. But beyond manual editing, it also offers a programmable layer so developers can write scripts to batch process documents, generate templated reports, or extract structured data from PDFs for integration in workflows. The design emphasizes quality and compatibility: output PDFs render accurately across readers, preserve metadata, and support interactive elements like hyperlinks and form fields.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    videodl

    videodl

    Lightweight Python tool for downloading videos from many platforms

    ...It supports numerous video platforms across both Chinese and international streaming ecosystems, enabling users to fetch content from many popular services through a unified interface. Videodl works by implementing platform-specific client modules that extract video information and download links from supported services. Videodl can integrate with external command-line utilities to improve downloading performance, handle streaming formats such as HLS, and manage encrypted or segmented media streams. Additional utilities can also enable faster downloads, resume interrupted transfers, and process complex playlist structures.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB