data extraction free download

Showing 31 open source projects for "data extraction"

View related business solutions

JavaScript Clear Filters & Widen Search

$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
1

AgentQL MCP

Model Context Protocol server that integrates AgentQL's data

The AgentQL MCP Server is a Model Context Protocol (MCP) server that integrates AgentQL's data extraction capabilities, enabling users to extract structured data from web pages using natural language prompts.

Downloads: 0 This Week

Last Update: 2025-04-08
See Project
2

newpipeextractor

Library for extracting streaming site data without official APIs

...It handles many low-level tasks involved in web data extraction, including parsing responses, managing platform-specific logic, and handling errors, allowing developers to focus on implementing application features rather than scraping mechanics. Each supported service is implemented through its own extractor components that conform to a common interface, enabling consistent access to data across different platforms.

Downloads: 0 This Week

Last Update: 2026-07-21
See Project
3

Wiseflow

Enhance any agent's browser use skill

Wiseflow is an open-source information extraction and knowledge discovery system designed to collect, filter, and organize valuable information from large volumes of online content. The platform continuously monitors specified sources such as websites, social platforms, and other digital channels to identify relevant data according to user-defined interests or topics. By combining web crawling, content parsing, and large language model analysis, the system extracts concise insights from raw information streams and converts them into structured data that can be stored or analyzed. ...

Downloads: 2 This Week

Last Update: 2026-07-21
See Project
4

Vectorize MCP Server

Official Vectorize MCP Server

The Vectorize MCP Server is a Model Context Protocol server that integrates with Vectorize, offering advanced vector retrieval and text extraction capabilities.

Downloads: 0 This Week

Last Update: 2025-04-08
See Project
Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
5

The Web MCP

A powerful Model Context Protocol (MCP) server

Bright Data’s Web MCP server gives AI assistants robust, real-time web capabilities through an MCP interface designed to avoid blocks, rate limits, and CAPTCHAs. It presents search, crawl, navigate, and extraction tools that agents can call directly, replacing brittle scraping prompts with typed operations. The README markets it as a “gateway” to the live web so assistants don’t fall back to stale training data. Bright Data also advertises a getting-started tier with a free monthly allotment, plus options for remote or self-hosted operation depending on governance needs. ...

Downloads: 8 This Week

Last Update: 7 days ago
See Project
6

npm-pdfreader

Parse text and tables from PDF files.

npm-pdfreader is a Node.js library for reading text and parsing tables from PDF files. It supports tabular data with automatic column detection and rule-based parsing, making it useful for extracting structured data from PDFs.

Downloads: 0 This Week

Last Update: 2025-11-01
See Project
7

web-access

Skill for installing full networking capabilities for Claude Code

web-access is a tool designed to give AI agents structured and controlled access to web content, enabling them to retrieve, navigate, and process information from online sources in real time. It abstracts common web interactions such as page loading, data extraction, and navigation into reusable functions that can be invoked by agents. The system emphasizes safety and control, likely including mechanisms to manage permissions, rate limits, and content filtering. This allows agents to operate within defined boundaries while still benefiting from dynamic, up-to-date information. The architecture supports integration with broader agent frameworks, making it a key component for building systems that require external knowledge. ...

Downloads: 0 This Week

Last Update: 2026-05-15
See Project
8

Magnitude

Vision AI browser agent for automation, testing, and extraction

...This approach allows the agent to generalize better across complex and modern websites, making it more robust than traditional selector-based automation tools. Browser Agent by Magnitude supports a wide range of capabilities including navigation, interaction, data extraction, and automated verification through built-in testing features. Developers can use it to automate repetitive web tasks, integrate services without APIs, or build advanced browser-based agents. It also provides flexible abstraction levels, allowing both high-level task execution and precise low-level control of actions like mouse movements and keyboard input.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
9

WanGP

AI video generator optimized for low VRAM and older GPUs use

...Wan2GP provides a full web-based interface that simplifies interaction with complex generative pipelines, making it easier to configure prompts, models, and rendering settings. It also integrates a wide range of utilities such as prompt enhancement, mask editing, motion design, and extraction tools for pose, depth, and flow data to support advanced video workflows.

Downloads: 84 This Week

Last Update: 7 days ago
See Project
Earn up to 16% annual interest with Nexo.
Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
10

h265web.js

A HEVC/H.265 Web Player

h265web.js is a WebAssembly-powered video decoding library designed to enable playback and processing of H.265/HEVC video streams directly in web browsers without relying on native browser codec support. It provides a low-level decoding API that allows developers to build custom video players capable of handling raw H.265 streams, which are typically not widely supported natively in browsers. The project includes components for parsing H.265 bitstreams into NAL units and decoding them into...

Downloads: 3 This Week

Last Update: 9 hours ago
See Project
11

Article Extractor

To extract main article from given URL with Node.js

A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.

Downloads: 0 This Week

Last Update: 2026-05-03
See Project
12

Open Semantic Search

Open source semantic search and text analytics for large document sets

...It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.

Downloads: 4 This Week

Last Update: 2 days ago
See Project
13

sharp

High performance Node.js image processing module

...Colour spaces, embedded ICC profiles and alpha transparency channels are all handled correctly. Lanczos resampling ensures quality is not sacrificed for speed. As well as image resizing, operations such as rotation, extraction, compositing and gamma correction are available. Most modern macOS, Windows and Linux systems running Node.js v10+ do not require any additional install or runtime dependencies. This module supports reading JPEG, PNG, WebP, AVIF, TIFF, GIF and SVG images. Output images can be in JPEG, PNG, WebP, AVIF and TIFF formats as well as uncompressed raw pixel data. ...

Downloads: 2 This Week

Last Update: 2026-07-01
See Project
14

MCiSEE

All of Minecraft, EASILY get Minecraft resources

MCiSEE is an open-source project designed to integrate Minecraft with computer vision and artificial intelligence experiments. The system focuses on capturing visual information from the game environment and exposing it to external programs for analysis or machine learning research. By converting gameplay data into visual or structured formats, MCiSEE enables researchers and developers to build AI agents capable of interacting with the Minecraft environment. The project can be used as a...

Downloads: 0 This Week

Last Update: 2026-07-21
See Project
15

Browserbase Skills

Claude Agent SDK with a web browsing tool

Browserbase Skills is a collection of reusable automation “skills” designed to enable AI agents to interact with web environments programmatically. It provides structured workflows that abstract browser actions such as navigation, form filling, and data extraction into composable building blocks. The system is intended to simplify the development of browser-based agents by offering prebuilt capabilities that can be orchestrated together. It integrates with headless browser infrastructure, allowing scalable automation across multiple sessions. The design emphasizes reliability and repeatability, reducing the complexity of handling dynamic web interfaces. ...

Downloads: 5 This Week

Last Update: 2026-07-09
See Project
16

DeepCamera

Open-Source AI Camera. Empower any camera/CCTV

...SharpAI yolov7_reid is an open-source Python application that leverages AI technologies to detect intruders with traditional surveillance cameras. The source code is here It leverages Yolov7 as a person detector, FastReID for person feature extraction, Milvus the local vector database for self-supervised learning to identify unseen persons, Labelstudio to host images locally and for further usage such as label data and train your own classifier. It also integrates with Home-Assistant to empower smart homes with AI technology.

Downloads: 17 This Week

Last Update: 2026-03-20
See Project
17

chrome-cdp

Give your AI agent access to your live Chrome session

chrome-cdp-skill is a specialized integration that enables AI agents to control and interact with web browsers through the Chrome DevTools Protocol (CDP). It allows agents to perform tasks such as navigating pages, extracting data, interacting with elements, and executing scripts in a browser environment. The project is designed to extend the capabilities of AI systems beyond static knowledge by giving them real-time access to web content and interactive interfaces. Its architecture likely...

Downloads: 0 This Week

Last Update: 2026-06-28
See Project
18

Markdownify MCP Server

Convert files and web content into clean, usable Markdown easily

...It supports formats such as PDFs, images, audio with transcription, DOCX, XLSX, and PPTX, along with web sources like YouTube transcripts, Bing results, and general webpages. Markdownify MCP is designed to simplify content extraction and make data easier to read, share, and reuse in structured workflows. Developers can install dependencies, build, and run the server locally, then extend functionality by modifying its TypeScript-based tools and server logic. It also allows retrieval of existing Markdown files, making it useful for documentation, research, and AI-assisted workflows. ...

Downloads: 0 This Week

Last Update: 2026-05-02
See Project
19

browserable

Open source and self-hostable browser automation library for AI agents

Browserable is an open-source browser automation framework designed specifically for AI agents that need to interact with web interfaces in a human-like way. The project provides tools that allow automated agents to navigate websites, click buttons, fill out forms, and extract information from pages without manual scripting of each step. Built primarily in JavaScript, the framework offers both a developer-friendly SDK and a REST API that allow integration with AI applications and automation...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
20

Firecrawl MCP Server

Adds powerful web scraping and search to Cursor and Claude

firecrawl-mcp-server is the official MCP integration for Firecrawl that brings high-recall web scraping, crawling, and search into IDEs and agent runtimes. It exposes tools for single-page scrape, multi-URL batch jobs, site discovery, and search enrichment, returning cleaned, structured content suitable for downstream LLM reasoning. The server is designed to run with Firecrawl’s hosted API or self-hosted deployments, making it flexible for enterprise data-governance requirements. Built-in...

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
21

DotnetSpider

Lightweight .NET framework for fast web crawling and data scraping

DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
22

MBR Bulk WP Detector

A free WP plugin that lets you check unlimited URLs

MBR Bulk WP Detector is a free WordPress plugin that lets you check unlimited URLs right from your own dashboard. No subscriptions, no URL limits, and your data stays completely private on your server. What Can You Do With It? The basics are simple: Paste a list of URLs (or upload a CSV file), click a button, and boom—you’ve got a clear breakdown of which sites are running WordPress and which aren’t. But it gets better… Turn on Deep Scan mode, and you’ll also discover what...

Downloads: 1 This Week

Last Update: 2026-03-26
See Project
23

spider_collection

Collection of Python web scraping scripts for data extraction tasks

spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....

Downloads: 1 This Week

Last Update: 2026-07-27
See Project
24

Neiki's Gallery

Vanilla JavaScript image gallery & lightbox

Neiki's Gallery is a lightweight, production-ready image gallery and lightbox library built with vanilla JavaScript and CSS. It requires no dependencies and can be integrated with a single <script> tag, with automatic initialization out of the box. It provides a highly customizable experience for modern web projects, combining performance, flexibility, and rich UI interactions. Designed for both developers and end users, it supports responsive layouts, advanced lightbox features, touch...

Downloads: 0 This Week

Last Update: 2026-07-10
See Project
25

Exifr

The fastest and most versatile JS EXIF reading library

Exifr is a fast and very versatile JavaScript EXIF reading library that works everywhere, parses everything and handles just about anything you throw at it. It can handle any input: buffers, url, <img> tag and more; .jpg, .tif, and .heic files; and TIFF (EXIF, GPS, etc.), XMP, ICC, IPTC, JFIF segments. It skips parsing tags you don’t need, and reads only the first few bytes. There’s no need to read the whole file to see if there’s an EXIF file in it, or extract all the data when you just...

Downloads: 3 This Week

Last Update: 2022-06-29
See Project