web scraping free download

Showing 184 open source projects for "web scraping"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
1

Web Scraping for Laravel

Laravel adapter for Roach, the complete web scraping toolkit for PHP

This is the Laravel adapter for Roach, the complete web scraping toolkit for PHP. Easily integrate Roach into any Laravel application. The Laravel adapter mostly provides the necessary container bindings for the various services Roach uses, as well as making certain configuration options available via a config file. The Laravel adapter of Roach registers a few Artisan commands to make out development experience as pleasant as possible.

Downloads: 0 This Week

Last Update: 2025-03-21
See Project
2

Web RPA

Web Robotics Process Automation Tool

...It supports interaction with web elements, navigation flows, and dynamic content handling, making it suitable for scraping and automation scenarios. WebRPA can be integrated into larger systems or used as a standalone tool for automating browser-based operations. Its lightweight design ensures efficient execution while maintaining adaptability for different use cases.

Downloads: 0 This Week

Last Update: 4 days ago
See Project
3

The Web MCP

A powerful Model Context Protocol (MCP) server

Bright Data’s Web MCP server gives AI assistants robust, real-time web capabilities through an MCP interface designed to avoid blocks, rate limits, and CAPTCHAs. It presents search, crawl, navigate, and extraction tools that agents can call directly, replacing brittle scraping prompts with typed operations. The README markets it as a “gateway” to the live web so assistants don’t fall back to stale training data.

Downloads: 0 This Week

Last Update: 2026-03-29
See Project
4

skycaiji

Open source web scraping system for automated data collection tasks

...SkyCaiji also supports automated workflows that continuously gather data and process it based on defined collection rules. Its architecture enables users to build scalable web scraping pipelines that can run unattended once configured.

Downloads: 4 This Week

Last Update: 3 days ago
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
5

Deep Research Web UI

AI-powered research assistant that performs iterative, deep research

Deep Research Web UI is an AI-powered research assistant interface designed to automate complex, multi-step information gathering workflows through a combination of search engines, web scraping, and large language models. It operates as a front-end system for deep research agents that iteratively refine queries, retrieve information from multiple sources, and synthesize structured outputs into coherent reports.

Downloads: 1 This Week

Last Update: 2026-03-19
See Project
6

kimuraframework

AI-first Ruby framework for building fast, flexible web scraping spide

Kimurai is an open source web scraping framework written in Ruby that simplifies the process of building automated data extraction tools. It provides a clean domain-specific language that allows developers to define scraping logic and data schemas with minimal boilerplate code. Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance. ...

Downloads: 0 This Week

Last Update: 3 days ago
See Project
7

rvest

Simple web scraping for R

rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.

Downloads: 0 This Week

Last Update: 2025-08-29
See Project
8

spider_collection

Collection of Python web scraping scripts for data extraction tasks

spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages. ...

Downloads: 3 This Week

Last Update: 5 days ago
See Project
9

wombat

Lightweight Ruby DSL for scraping structured data from web pages

Wombat is a lightweight web crawling and scraping library written in Ruby that focuses on extracting structured data from web pages using a concise domain-specific language (DSL). It is designed to simplify the process of defining how information should be collected from HTML documents without requiring large amounts of scraping boilerplate code. Developers can declare the data fields they want and specify selectors or rules for retrieving them, allowing Wombat to parse and return structured results. ...

Downloads: 0 This Week

Last Update: 2026-04-07
See Project
Build Securely on Azure with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
10

crawler

Collection of JS reverse engineering examples for web scraping study

crawler is a collection of web scraping and JavaScript reverse engineering examples designed for learning how modern websites protect their data and how those protections can be analyzed. It contains many case studies that demonstrate how to analyze and replicate request parameters, cookies, and encryption logic used by real websites. Each directory in the project focuses on a specific target service or scenario, showing how browser network requests and JavaScript code can be studied to reproduce API calls programmatically. ...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
11

Robin

AI-powered tool for dark web OSINT search and investigation

...Robin also performs scraping of discovered pages through Tor sessions, allowing users to gather additional context from dark web sites while maintaining the required network routing. By integrating AI models, the platform can interpret results, highlight key information, and produce summaries that help analysts understand findings faster. The project provides a modular architecture separating search, scraping, and AI processing components so it can be extended with new data sources.

Downloads: 19 This Week

Last Update: 2026-03-31
See Project
12

ScrapeGraphAI

Python scraper based on AI

Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.

Downloads: 3 This Week

Last Update: 2026-04-19
See Project
13

Roach

The complete web scraping toolkit for PHP

Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well.

Downloads: 0 This Week

Last Update: 2025-03-21
See Project
14

X-Crawl

Flexible Node.js AI-assisted crawler library

A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.

Downloads: 0 This Week

Last Update: 2025-04-06
See Project
15

Scrapling

An adaptive Web Scraping framework

Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques.

Downloads: 4 This Week

Last Update: 2026-04-17
See Project
16

Geziyor

Blazing fast Go framework for web crawling and data scraping tasks

Geziyor is a high-performance web crawling and web scraping framework built for the Go programming language. It is designed to help developers crawl websites and extract structured information from web pages efficiently. It focuses on speed and scalability, allowing large numbers of requests to be processed concurrently. Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows.

Downloads: 0 This Week

Last Update: 5 days ago
See Project
17

Python-Spider

Python3 web crawler practice

Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. ...

Downloads: 0 This Week

Last Update: 2025-12-08
See Project
18

Ulixee Hero

The web browser built for scraping

It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise...

Downloads: 0 This Week

Last Update: 2025-09-08
See Project
19

Python Code Tutorials

The Python Code Tutorials

Python Code Tutorials is a large educational repository that aggregates programming tutorials from the “The Python Code” website into a structured collection of Python projects and learning materials. The repository covers a wide range of programming topics including cybersecurity, networking, web scraping, machine learning, GUI development, and automation scripts. Each tutorial typically includes complete Python code examples and explanations that demonstrate how to build real tools and applications step by step. Many tutorials focus on practical implementations such as building network scanners, web scraping tools, object detection systems, and automation utilities using Python libraries. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
20

FEAPDER

Powerful Python crawler framework for scalable web scraping tasks

feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based jobs. feapder supports features such as breakpoint resume, allowing crawlers to continue from where they stopped without losing progress. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
21

QueryList

Progressive PHP web crawler framework with jQuery-like DOM parsing

QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. ...

Downloads: 2 This Week

Last Update: 2026-03-10
See Project
22

CyberScraper 2077

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.

Downloads: 6 This Week

Last Update: 2026-01-20
See Project
23

Firecrawl MCP Server

Adds powerful web scraping and search to Cursor and Claude

firecrawl-mcp-server is the official MCP integration for Firecrawl that brings high-recall web scraping, crawling, and search into IDEs and agent runtimes. It exposes tools for single-page scrape, multi-URL batch jobs, site discovery, and search enrichment, returning cleaned, structured content suitable for downstream LLM reasoning. The server is designed to run with Firecrawl’s hosted API or self-hosted deployments, making it flexible for enterprise data-governance requirements. ...

Downloads: 2 This Week

Last Update: 2025-10-08
See Project
24

Parsera

Lightweight library for scraping web-sites with LLMs

Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
25

Symfony Panther

A browser testing and web crawling library for PHP and Symfony

Symfony Panther is a browser testing and web scraping tool that allows developers to interact with websites programmatically. It uses headless Chrome or Firefox to automate browser tasks, making it suitable for end-to-end testing and data extraction. Panther integrates well with Symfony and PHPUnit, allowing developers to write comprehensive tests for web applications.

Downloads: 0 This Week

Last Update: 2026-01-08
See Project