RobotsDisallowed is a public catalog that tracks websites and organizations explicitly blocking AI and web-scraping crawlers in their robots.txt or related mechanisms. It focuses on documenting the growing trend of content owners asserting control over how their data is used for model training and automated harvesting. The project aggregates domains, notes the targeted bots or user agents, and surfaces patterns for researchers, policymakers, and tool builders. It serves both as a transparency effort and as a resource for people designing allow/deny strategies for automated access. The dataset invites community contributions to keep the picture current as new bots emerge and policies shift. It also highlights the intersection of web standards, ethics, and AI governance by showing how site owners operationalize consent and restriction at scale.

Features

  • Curated list of domains that disallow AI or scraping bots
  • Identification of targeted user agents and blocking patterns
  • Community-updated dataset reflecting policy changes
  • Reference for researchers and builders of crawl-aware tools
  • Snapshot of evolving norms around data usage and consent
  • Lightweight format for analysis and reuse

Project Samples

Project Activity

See All Activity >

Categories

Libraries

Follow RobotsDisallowed

RobotsDisallowed Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of RobotsDisallowed!

Additional Project Details

Registered

2025-10-28