RobotsTxt

This is a high-performance, production-tested library for parsing and evaluating robots.txt rules against crawler user agents. It implements the core semantics of the Robots Exclusion Protocol: user-agent sections, Allow/Disallow directives, wildcard handling, and precedence rules. The code is optimized for speed and low memory so large crawls can evaluate millions of URLs quickly. It also focuses on correctness—edge cases like overlapping patterns and longest-match resolution are handled consistently. Consumers integrate it to decide whether a specific URL may be fetched by a particular bot name and to respect crawl-delay or sitemaps hints where applicable. The library serves both search-scale crawlers and smaller tools that need a reliable decision engine for polite crawling.

Features

Fast parser and matcher for Allow/Disallow rules
Correct handling of wildcards and longest-match precedence
User-agent specific rule sections with sensible fallbacks
Low-overhead evaluation for high-throughput crawlers
Support for common extensions like Sitemap hints
Clear API to check URL fetch permissions per bot name

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow RobotsTxt

RobotsTxt Web Site

Other Useful Business Software

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Rate This Project

User Reviews

Be the first to post a review of RobotsTxt!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

C++

Related Categories

C++ Robotics Software

Registered

2025-10-09

Similar Business Software

QBench

The modern, flexible, easy-to-use LIMS. QBench enables our customers to get a LIMS up and running faster. Automate your entire lab with our developer-friendly API, Inventory Management, Customer Portal, Billing, and Quality Management System modules. QBench is a cloud-based Laboratory...

See Software
Lockbox LIMS

A sample tracking, test result capture, and inventory management cloud LIMS for life science research, biotech/NGS, and industrial QC labs. Includes regulatory support for CLIA, HIPAA, Part 11, and ISO 17025. Nothing is more critical to a lab’s success than the quality, security, and...

See Software
SAP S/4HANA Cloud Public Edition

SAP Cloud ERP is the premier ERP solution for growth-focused organizations. Seamlessly integrating AI, and predictive analytics, it empowers businesses to digitally transform and streamline processes end to end. Leveraging built-in industry best practices, SAP Cloud ERP accelerates...

See Software
Qualio

Qualio is the leading quality and compliance platform built exclusively for emerging life sciences companies. MedTech, pharma, biotech, and diagnostics teams use Qualio to standardize quality processes, connect them to regulatory obligations, and gain real-time visibility into compliance...

See Software
RegDesk

RegDesk is a Regulatory Information Management System (RIMS) that helps medical device companies manage global regulatory submissions, product registrations, and compliance in one centralized platform. It streamlines regulatory workflows, organizes regulatory data, and provides global regulatory...

See Software
Calira

Calira is an equipment booking and management platform for shared R&D lab equipment. It replaces the shared spreadsheets, Outlook calendars, paper signup sheets, and other improvised systems that most labs use to manage access to shared instruments. Labs use Calira to track instrument...

See Software