grab-site is an open source web crawling tool designed to archive and back up websites by recursively downloading their content. It works by taking a starting URL and systematically following links across the site, capturing pages and resources and saving them into WARC archive files for long-term preservation. Internally, the crawler uses a fork of the wpull engine to fetch and process web pages efficiently during large-scale crawls. grab-site includes a built-in dashboard that displays real-time crawl activity, including which URLs are currently being processed and how many remain in the queue. Users can dynamically apply ignore patterns during an active crawl, allowing them to skip problematic or unnecessary URLs that could slow down or block the archiving process. grab-site also provides predefined ignore sets for common site structures such as forums and other complex web platforms. Additional mechanisms like duplicate page detection help avoid re-crawling identical content.

Features

  • Recursive website crawling starting from one or more URLs
  • Saves captured content in WARC archival format
  • Built-in dashboard for monitoring active crawls and URL queues
  • Dynamic ignore patterns that can be edited while crawling
  • Duplicate page detection to avoid reprocessing identical content
  • Disk-based URL queue designed for very large crawl workloads

Project Samples

Project Activity

See All Activity >

Categories

Web Scrapers

License

Other License

Follow grab-site

grab-site Web Site

Other Useful Business Software
Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
Sign Up Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of grab-site!

Additional Project Details

Programming Language

Python, Unix Shell

Related Categories

Unix Shell Web Scrapers, Python Web Scrapers

Registered

2026-03-11