Download Latest Version siteone-crawler-v1.0.9-linux-arm64.tar.gz (28.9 MB)
Email in envelope

Get an email when there's a new version of SiteOne Crawler

Home / v1.0.9
Name Modified Size InfoDownloads / Week
Parent folder
siteone-crawler-v1.0.9-win-x64.zip 2025-06-08 88.9 MB
siteone-crawler-v1.0.9-linux-arm64.tar.gz 2025-06-08 28.9 MB
siteone-crawler-v1.0.9-linux-x64.tar.gz 2025-06-08 29.7 MB
siteone-crawler-v1.0.9-macos-arm64.tar.gz 2025-06-08 27.3 MB
siteone-crawler-v1.0.9-macos-x64.tar.gz 2025-06-08 28.2 MB
README.md 2025-06-08 2.9 kB
v1.0.9 source code.tar.gz 2025-06-08 38.5 MB
v1.0.9 source code.zip 2025-06-08 38.5 MB
Totals: 8 Items   280.0 MB 1

This release introduces a powerful new Website to Markdown converter, allowing you to export entire websites into clean, single or multiple Markdown files, which is ideal for AI context or documentation purposes. We've also added the ability to start crawling directly from a sitemap.xml file and significantly enhanced the Offline Website Exporter with more granular control and better handling of international characters. Numerous new command-line options have been added for greater flexibility in crawling, filtering, and reporting, alongside many other improvements and bug fixes.

New Features

  • Website to Markdown Converter: A major new feature to convert entire websites into clean Markdown files, replacing the previous dependency on html2markdown.
  • Single-File Markdown Export: Use --markdown-export-single-file to combine all website content into a single, organized Markdown file, with smart removal of duplicate headers/footers.
  • Crawl from Sitemap: You can now provide a URL to a sitemap.xml or sitemap index file directly to the --url parameter to crawl all listed URLs.
  • Video Gallery in HTML Report: The HTML report now includes a gallery of all found videos, with lazy loading and an interactive player.
  • Custom DNS Resolution: Added the --resolve option (like curl) to provide custom IP addresses for specific domains and ports.
  • XPath and RegEx in Extra Columns: Enhance custom data extraction with support for XPath 1.0 and Regular Expressions in the --extra-columns option.
  • Max Crawl Depth: Control the crawling scope with the new --max-depth parameter for limiting how deep the crawler goes (for pages, not assets).
  • Customizable HTML Reports: Use --html-report-options to select which sections to include in the final HTML report.

Improvements

  • Offline Website Exporter:
    • New --offline-export-remove-unwanted-code option to automatically strip analytics, cookie consents, and other non-essential scripts.
    • New --offline-export-no-auto-redirect-html flag to prevent the creation of meta-refresh redirect files.
    • Better handling of file paths with UTF-8 characters.
  • URL Transformations: Added --transform-url to internally change request URLs, useful for crawling sites that serve content from a different domain (e.g., a local instance).
  • Loop Protection: New --max-non200-responses-per-basename option to prevent getting stuck in loops with dynamically generated error pages.
  • Timezone Support: Set a --timezone for all dates and times displayed in reports and used in exported filenames.
  • Smarter Image Analysis: The WebP analysis will no longer report missing WebP images if more optimized AVIF alternatives are already present.
  • LICENSE: Switched to MIT: The project license has been changed to the more permissive MIT license.
Source: README.md, updated 2025-06-08