Download Latest Version Release v0.4.5 source code.tar.gz (2.3 MB)
Email in envelope

Get an email when there's a new version of Scrapling

Home / v0.4.4
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2026-04-05 4.7 kB
Release v0.4.4 source code.tar.gz 2026-04-05 2.3 MB
Release v0.4.4 source code.zip 2026-04-05 2.4 MB
Totals: 3 Items   4.6 MB 0

A new update with important spider improvements and bug fixes 🎉

🚀 New Stuff and quality of life changes

  • Added robots.txt compliance to the Spider framework with a new robots_txt_obey option. When enabled, the spider will automatically fetch and respect robots.txt rules before crawling, including Disallow, Crawl-delay, and Request-rate directives. Robots.txt files are fetched concurrently and cached per domain for the entire crawl. By @AbdullahY36 in #226
  • Added robots.txt cache pre-warming so all start_urls domains have their robots.txt fetched and parsed before the crawl loop begins, avoiding delays on the first request to each domain.
  • Added a new robots_disallowed_count stat to CrawlStats to track how many requests were blocked by robots.txt rules during a crawl.

Check it out on the website from here

🐛 Bug Fixes

  • Fixed a critical MRO issue with ProxyRotator where the _build_context_with_proxy stub was shadowing the real implementation from child classes, causing proxy rotation to always raise NotImplementedError (Fixes #215). Thanks @yetval
  • Fixed a page pool leak when using per-request proxy rotation with browser sessions. Pages created inside temporary contexts were not removed from the pool on cleanup, leading to stale references accumulating over time. By @yetval in #223
  • Fixed a missing type assertion in the static fetcher where curl_cffi could return None from session.request(), causing downstream errors.

Other

  • Updated dependencies, so expect the latest fingerprints and other stuff.
  • Added protego as a new dependency under the fetchers optional group for robots.txt parsing.

🙏 Special thanks to the community for all the continuous testing and feedback


Big shoutout to our Platinum Sponsors

Source: README.md, updated 2026-04-05