web site scraper free download

Showing 4549 open source projects for "web site scraper"

View related business solutions

Internet Clear Filters & Widen Search

$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

CommunityScrapers

This is a public repository containing scrapers

Stash Community Scrapers is a large open-source collection of metadata extraction tools designed to work with the Stash media management platform, enabling automated scraping of content information from various online sources. The repository contains hundreds of scraper definitions written primarily in YAML and Python, each tailored to extract structured metadata such as titles, performers, tags, and media details from specific websites. These scrapers integrate directly into Stash, allowing...

Downloads: 2 This Week

Last Update: 2026-04-14
See Project
2

CyberScraper 2077

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.

Downloads: 2 This Week

Last Update: 2026-01-20
See Project
3

Scraper of Death

Scraper of Death is a web scraper. Multiple Scraping Methods Requests + BeautifulSoup (fast, lightweight) Selenium (JavaScript support, dynamic content)

Downloads: 3 This Week

Last Update: 2026-02-19
See Project
4

JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...

Downloads: 1 This Week

Last Update: 2024-09-29
See Project
Earn up to 16% annual interest with Nexo.
More flexibility. More control.

Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
5

html-metadata

MetaData html scraper and parser for Node.js (supports Promises

The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of...

Downloads: 0 This Week

Last Update: 2025-04-30
See Project
6

ScrapeGraphAI

Python scraper based on AI

Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.

Downloads: 3 This Week

Last Update: 2 days ago
See Project
7

dude uncomplicated data extraction

dude uncomplicated data extraction: A simple framework

Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.

Downloads: 0 This Week

Last Update: 2024-03-02
See Project
8

Crawl4AI

Open-source LLM Friendly Web Crawler & Scraper

Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
9

Ulixee Hero

The web browser built for scraping

It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise...

Downloads: 0 This Week

Last Update: 2025-09-08
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
10

Spider

High-performance Rust web crawler and scraper for large-scale data

Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...

Downloads: 1 This Week

Last Update: 2026-03-31
See Project
11

goclone

Fast CLI tool for cloning entire websites for local browsing offline

goclone is a command-line utility designed to download and mirror complete websites to a local directory for offline access. It retrieves HTML pages, stylesheets, JavaScript files, images, and other assets from a target site and stores them on the user’s computer. It preserves the original site’s structure by maintaining relative links between pages, allowing the mirrored copy to function similarly to the live version when opened locally. Once a site has been cloned, users can browse the pages offline and navigate between them as if they were viewing the site online. goclone is written in Go and leverages concurrency through Go routines to perform downloads efficiently. goclone can also optionally start a local web server to serve the mirrored files for a more realistic browsing experience. ...

Downloads: 7 This Week

Last Update: 2026-03-11
See Project
12

MDCx

Movie metadata scraper and organizer for media libraries and NFO

MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports...

Downloads: 4 This Week

Last Update: 2026-03-10
See Project
13

eleventy

A simpler site generator. Transforms a directory of templates

A static site generator for modern web development, focusing on flexibility and customization.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
14

Material e621

Material e621 is a modern, open source web client for e621.net

Material e621 is an open-source web client designed as a modern alternative interface for browsing content on the e621 platform, offering improved usability, customization, and performance compared to the original site. It is built with modern frontend technologies such as Vue and TypeScript and follows a Material Design-inspired aesthetic to provide a cleaner and more intuitive user experience.

Downloads: 11 This Week

Last Update: 2026-03-17
See Project
15

rvest

Simple web scraping for R

rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.

Downloads: 2 This Week

Last Update: 2025-08-29
See Project
16

WordPress

Just a mirror of the WordPress subversion repository

WordPress is one of the world’s most widely used content management systems (CMS), powering blogs, websites, and increasingly web apps. It offers a flexible architecture of themes and plugins, where users can extend functionality or customize layout without touching core code. The administrative dashboard includes post and page editors, media library, user roles, plugin/theme installation, and site settings. Through its REST API and headless mode, WordPress also serves as a backend for decoupled front ends using frameworks like React, Vue, or Gatsby. ...

Downloads: 21 This Week

Last Update: 2026-03-11
See Project
17

Certbot

Get free HTTPS certificates forever from Let's Encrypt

Certbot is a fully-featured, easy-to-use, extensible client for the Let's Encrypt CA. It fetches a digital certificate from Let’s Encrypt, an open certificate authority launched by the EFF, Mozilla, and others. This certificate then lets browsers verify the identity of web servers and ensures secure communication over the Web. Obtaining and maintaining a certificate is usually such a hassle, but with Certbot and Let’s Encrypt it becomes automated and hassle-free. With just a few simple...

1 Review

Downloads: 135 This Week

Last Update: 2026-04-07
See Project
18

Netlify CMS

A Git-based CMS for static site generators

Open source content management for your Git workflow. Use Netlify CMS with any static site generator for a faster and more flexible web project. Get the speed, security, and scalability of a static site, while still providing a convenient editing interface for content. Content is stored in your Git repository alongside your code for easier versioning, multi-channel publishing, and the option to handle content updates directly in Git.

Downloads: 10 This Week

Last Update: 4 days ago
See Project
19

AMP

Web component framework for building ads, emails, websites and more

AMP is an open source web component framework that allows you to easily create user-first websites, ads, emails, stories and more. AMP creates fast, smooth-loading web pages that prioritize the user-experience, consistently providing a fast experience across all devices and platforms.

Downloads: 4 This Week

Last Update: 2026-03-18
See Project
20

crwlr

Library for Rapid (Web) Crawler and Scraper Development

This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
21

Decap

A Git-based CMS for Static Site Generators

Open source content management for your Git workflow. Use Decap CMS with any static site generator for a faster and more flexible web project. Get the speed, security, and scalability of a static site, while still providing a convenient editing interface for content. Content is stored in your Git repository alongside your code for easier versioning, multi-channel publishing, and the option to handle content updates directly in Git.

Downloads: 1 This Week

Last Update: 4 days ago
See Project
22

Plausible Analytics

Simple, open-source, lightweight and privacy-friendly web analytics

Web analytics went from a simple, fun and useful practice for site owners to a data-grabbing machine for surveillance capitalism. Google Analytics is frustrating to use, difficult to understand, slow to load and privacy-invasive too. Plausible Analytics is built for privacy-conscious site owners. You get valuable and actionable stats to help you improve your efforts while your visitors keep having a nice and enjoyable experience.

Downloads: 0 This Week

Last Update: 2026-01-16
See Project
23

PT Plugin Plus

Google Chrome extension

PT Assistant Plus, a browser plug-in (Web Extensions) for Google Chrome and Firefox, is mainly used to assist in downloading seeds from PT stations. PT Assistant Plus is a browser plug-in (Web Extensions), a tool that can improve the efficiency of PT site usage. Applicable to various PT stations, it can make various operations such as downloading seeds easier and faster.

Downloads: 1 This Week

Last Update: 2025-08-04
See Project
24

Bedrock

WordPress boilerplate with modern development tools

WordPress boilerplate with modern development tools, easier configuration, and an improved folder structure. Bedrock is an open source project and completely free to use. Bedrock is a modern WordPress stack that helps you get started with the best development tools and project structure. Much of the philosophy behind Bedrock is inspired by the Twelve-Factor App methodology including the WordPress specific version. Bedrock is multisite network compatible, but needs the...

Downloads: 3 This Week

Last Update: 2026-03-19
See Project
25

Adguard Browser Extension

AdGuard browser extension

AdGuard is a fast and lightweight ad-blocking browser extension that effectively blocks all types of ads and trackers. AdGuard is a fast and lightweight ad blocking browser extension that effectively blocks all types of ads and trackers on all web pages. We focus on advanced privacy protection features to not just block known trackers, but prevent web sites from building your shadow profile. Unlike its standalone counterparts (AG for Windows, Mac), the browser extension is completely free...

Downloads: 43 This Week

Last Update: 2026-03-19
See Project