grab-site is an open source web crawling tool designed to archive and back up websites by recursively downloading their content. It works by taking a starting URL and systematically following links across the site, capturing pages and resources and saving them into WARC archive files for long-term preservation. Internally, the crawler uses a fork of the wpull engine to fetch and process web pages efficiently during large-scale crawls. grab-site includes a built-in dashboard that displays real-time crawl activity, including which URLs are currently being processed and how many remain in the queue. Users can dynamically apply ignore patterns during an active crawl, allowing them to skip problematic or unnecessary URLs that could slow down or block the archiving process. grab-site also provides predefined ignore sets for common site structures such as forums and other complex web platforms. Additional mechanisms like duplicate page detection help avoid re-crawling identical content.

Features

  • Recursive website crawling starting from one or more URLs
  • Saves captured content in WARC archival format
  • Built-in dashboard for monitoring active crawls and URL queues
  • Dynamic ignore patterns that can be edited while crawling
  • Duplicate page detection to avoid reprocessing identical content
  • Disk-based URL queue designed for very large crawl workloads

Project Samples

Project Activity

See All Activity >

Categories

Web Scrapers

License

Other License

Follow grab-site

grab-site Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of grab-site!

Additional Project Details

Programming Language

Python, Unix Shell

Related Categories

Unix Shell Web Scrapers, Python Web Scrapers

Registered

2026-03-11