grab-site is an open source web crawling tool designed to archive and back up websites by recursively downloading their content. It works by taking a starting URL and systematically following links across the site, capturing pages and resources and saving them into WARC archive files for long-term preservation. Internally, the crawler uses a fork of the wpull engine to fetch and process web pages efficiently during large-scale crawls. grab-site includes a built-in dashboard that displays real-time crawl activity, including which URLs are currently being processed and how many remain in the queue. Users can dynamically apply ignore patterns during an active crawl, allowing them to skip problematic or unnecessary URLs that could slow down or block the archiving process. grab-site also provides predefined ignore sets for common site structures such as forums and other complex web platforms. Additional mechanisms like duplicate page detection help avoid re-crawling identical content.

Features

  • Recursive website crawling starting from one or more URLs
  • Saves captured content in WARC archival format
  • Built-in dashboard for monitoring active crawls and URL queues
  • Dynamic ignore patterns that can be edited while crawling
  • Duplicate page detection to avoid reprocessing identical content
  • Disk-based URL queue designed for very large crawl workloads

Project Samples

Project Activity

See All Activity >

Categories

Web Scrapers

License

Other License

Follow grab-site

grab-site Web Site

Other Useful Business Software
Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
Compliant and Reliable File Transfers Backed by Top Security Certifications

Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
Start Free Trial
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of grab-site!

Additional Project Details

Programming Language

Python, Unix Shell

Related Categories

Unix Shell Web Scrapers, Python Web Scrapers

Registered

2026-03-11