extract free download

446 projects for "extract" with 1 filter applied:

BSD Clear Filters & Widen Search

Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

LangChain Extract

Did you say you like data?

LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
2

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API

text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models.

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
3

EMV NFC Paycard Enrollment

A Java library used to read and extract data from NFC EMV credit cards

Java library used to read and extract public data from NFC EMV credit cards.

Downloads: 45 This Week

Last Update: 2026-02-09
See Project
4

ldif-extract

Extrect selected entries from LDIF files like grep

ldif-extract is a small 'grep' like tool to extract and convert data from LDIF files. It could be used standalone or also in a pipe together with other tools like ldapsearch.

Downloads: 0 This Week

Last Update: 2026-01-10
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
5

Volatility

An advanced memory forensics framework

Volatility is a widely used open-source framework for analyzing memory captures (RAM dumps) from Windows, Linux, and macOS systems. It enables investigators and malware analysts to extract process lists, network connections, DLLs, strings, artifacts, and more. Volatility supports many plugins for detecting hidden processes, malware, rootkits, and event tracing. It’s essential in digital forensics and incident response workflows.

Downloads: 152 This Week

Last Update: 2025-07-03
See Project
6

PyPDF

A pure-python PDF library capable of splitting, merging, cropping

pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.

Downloads: 9 This Week

Last Update: 2026-03-23
See Project
7

Toutatis

Extract public Instagram account information from usernames

Toutatis is an open source command-line tool designed to extract publicly available information from Instagram accounts. It helps users gather various data points from a target profile by querying Instagram using a username or account ID. The tool can retrieve details such as profile metadata, follower counts, biography information, and other publicly accessible account attributes. In addition to basic profile data, Toutatis can also reveal contact details that may be publicly exposed, including email addresses and phone numbers associated with the account. ...

Downloads: 5 This Week

Last Update: 2 days ago
See Project
8

refactoring.nvim

The Refactoring library based off the Refactoring book

refactoring.nvim is a Neovim plugin developed to bring powerful automated code refactoring capabilities to one of the most popular text editors among programmers, giving developers a suite of refactoring operations that streamline repetitive restructuring tasks inside the editor. Built around an intuitive set of commands and a Lua API, the plugin allows users to extract and inline variables or functions, pull blocks of code into new files, and modify code structure without leaving the comfort of Neovim’s modal interface. It integrates with built-in Neovim selection modes and can work with third-party tools like Telescope to present refactoring options quickly, enabling rapid transformation of code patterns. ...

Downloads: 0 This Week

Last Update: 2026-02-17
See Project
9

Spatie Crawler

An easy to use, powerful crawler implemented in PHP

Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.

Downloads: 0 This Week

Last Update: 2026-03-20
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

DiscordChatExporter

Saves Discord chat logs to a file

...With both command-line and GUI usage patterns available through the ecosystem, it supports automation as well as manual workflows. Overall, DiscordChatExporter provides a reliable way to extract and preserve Discord communications outside the platform.

Downloads: 66 This Week

Last Update: 2026-03-21
See Project
11

LLM Scraper

Extract structured data from webpages using LLM-powered scraping

LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects.

Downloads: 0 This Week

Last Update: 5 days ago
See Project
12

docext

An on-premises, OCR-free unstructured data extraction

...This allows the system to detect and extract structured elements such as tables, signatures, key fields, and layout information while maintaining semantic understanding of the document content. The toolkit can also convert complex documents into structured markdown representations that preserve formatting and contextual relationships.

Downloads: 2 This Week

Last Update: 2026-03-12
See Project
13

Scribe.js

JavaScript OCR and text extraction for images and PDFs

Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. In addition to simple text extraction, Scribe.js supports writing or injecting a high-quality invisible text layer back into PDFs, effectively making them searchable and improving usability for indexing or accessibility. ...

Downloads: 2 This Week

Last Update: 2026-03-14
See Project
14

Chandra

OCR model for complex documents with layout-aware structured outputs

Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines.

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
15

Gitingest

Create prompt-friendly codebase digests from any Git repository URL

...The generated output is optimized for prompt usage, helping AI models understand codebases more effectively without requiring manual file aggregation. In addition to producing the code digest, Gitingest also calculates statistics about the extracted content such as repository structure, total size of the extract, and token count. Gitingest can be used as a command line utility or integrated directly into Python applications.

Downloads: 0 This Week

Last Update: 2026-03-13
See Project
16

Geziyor

Blazing fast Go framework for web crawling and data scraping tasks

Geziyor is a high-performance web crawling and web scraping framework built for the Go programming language. It is designed to help developers crawl websites and extract structured information from web pages efficiently. It focuses on speed and scalability, allowing large numbers of requests to be processed concurrently. Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows. It provides a flexible architecture where developers define parsing functions that process responses and extract the desired data. ...

Downloads: 0 This Week

Last Update: 5 days ago
See Project
17

Interface Design

Design engineering for Claude Code

...The plugin prompts users to confirm a design direction early in the process and then applies those principles consistently — from button sizes to spacing scales and color tokens — so work stays aligned with the established system. It also offers commands to inspect the current design system status, audit inconsistencies, and extract patterns back into a reusable format, making it a live feedback loop for quality UI work.

Downloads: 0 This Week

Last Update: 2026-02-08
See Project
18

Ksoup

Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML

Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML, extracting HTML tags, attributes, and text, and encoding and decoding HTML entities.

Downloads: 0 This Week

Last Update: 2025-06-08
See Project
19

Scrapy

A fast, high-level web crawling and web scraping framework

...Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring and automated testing.

Downloads: 19 This Week

Last Update: 2026-03-18
See Project
20

Article Extractor

To extract main article from given URL with Node.js

A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.

Downloads: 0 This Week

Last Update: 2025-09-04
See Project
21

videodl

Lightweight Python tool for downloading videos from many platforms

...It supports numerous video platforms across both Chinese and international streaming ecosystems, enabling users to fetch content from many popular services through a unified interface. Videodl works by implementing platform-specific client modules that extract video information and download links from supported services. Videodl can integrate with external command-line utilities to improve downloading performance, handle streaming formats such as HLS, and manage encrypted or segmented media streams. Additional utilities can also enable faster downloads, resume interrupted transfers, and process complex playlist structures.

Downloads: 10 This Week

Last Update: 3 days ago
See Project
22

AI-Crawler

Crawl a website starting from a URL, find relevant pages

...Unlike traditional web scrapers that rely on static selectors and manual scripting, it uses AI to dynamically identify and prioritize pages based on user intent, making it more flexible and resilient to changes in website structure. Users can define their data requirements in plain English, and the system will interpret those instructions to crawl a domain and extract structured data. The tool supports output formats such as JSON and Markdown, and it can generate or accept schemas to ensure that extracted data is structured according to application needs. It is designed as a low-code solution, reducing the complexity of building and maintaining custom scraping pipelines.

Downloads: 4 This Week

Last Update: 4 days ago
See Project
23

Scriberr

Self-hosted AI audio transcription

...The application includes a polished user interface that simplifies the management of recordings, transcripts, and annotations, making it suitable for both casual users and professionals handling large volumes of audio. Beyond transcription, Scriberr also integrates features such as summarization, tagging, and interaction with language models, allowing users to extract insights from conversations or meetings efficiently.

Downloads: 6 This Week

Last Update: 2026-03-19
See Project
24

PDFCraft

PDFCraft is a free, privacy-focused PDF toolkit

...At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite. But beyond manual editing, it also offers a programmable layer so developers can write scripts to batch process documents, generate templated reports, or extract structured data from PDFs for integration in workflows. The design emphasizes quality and compatibility: output PDFs render accurately across readers, preserve metadata, and support interactive elements like hyperlinks and form fields.

Downloads: 6 This Week

Last Update: 3 days ago
See Project
25

Open Semantic Search

Open source semantic search and text analytics for large document sets

...Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.

Downloads: 8 This Week

Last Update: 2026-03-30
See Project