#101 Disabled Robots doesn't work

v1.2.3
open
None
5
2012-09-13
2011-06-23
Richard SINELLE
No

If the case Enable on Robots is unticked and if no robots.txt file can be found, the crawler doesn't work.

Change the code of checkRobotTxtAllow method in Crawl.java

public boolean checkRobotTxtAllow(HttpDownloader httpDownloader)
        throws MalformedURLException, SearchLibException {
    RobotsTxtStatus robotsTxtStatus;
    if (this.robotsTxtEnabled) {
        RobotsTxt robotsTxt = this.config.getRobotsTxtCache().getRobotsTxt(
                httpDownloader, this.config, this.urlItem.getURL(), false);
        robotsTxtStatus = robotsTxt.getStatus(this.userAgent, this.urlItem);
    } else {
        robotsTxtStatus = RobotsTxtStatus.DISABLED;
    }
    this.urlItem.setRobotsTxtStatus(robotsTxtStatus);
    if (this.robotsTxtEnabled && robotsTxtStatus != RobotsTxtStatus.ALLOW
            && robotsTxtStatus != RobotsTxtStatus.NO_ROBOTSTXT) {
        this.urlItem.setFetchStatus(FetchStatus.NOT_ALLOWED);
        return false;
    }
    return true;
}

Discussion

  • Thank for your feedback Richard.

    It has been fIxed in revision svn 1159.