Menu

The rule obeyRobotsTxt(true) don't know all directives from robots,txt

Help
CFlorin
2014-01-16
2014-01-16
  • CFlorin

    CFlorin - 2014-01-16

    The rule obeyRobotsTxt(true) don't know all directives from robots.txt

    I have in robots.txt:

    Disallow: *?obj=*

    And url like "/index.php?obj=front& ..." are crawled...

    My rules:

    $crawler->addContentTypeReceiveRule('#text/html#');
    $crawler->addURLFilterRule('#\.(jpg|jpeg|gif|png|js|swf|xml|ico)$#i');
    $crawler->addURLFilterRule('#\/upload_data\/#i');
    $crawler->obeyNoFollowTags(true);
    $crawler->obeyRobotsTxt(true);
    
     
  • Anonymous

    Anonymous - 2014-01-16

    Hi!

    Could you please post the URL of the site where this occurs?

    Thanks!

     
  • CFlorin

    CFlorin - 2014-01-16

    This is the robots.txt file:

    User-agent: MSNbot
    Crawl-delay: 2

    User-agent:
    Disallow: /index.php?obj=feed&action=print&

    Disallow: /shopping/
    Disallow:
    -page-page-
    Disallow:
    ?obj=
    Disallow:
    ?ft_term=*

    Disallow: -uid-

    Disallow: -rid-1-page-
    Disallow: -rid-2-page-
    Disallow: -rid-3-page-
    Disallow: -rid-4-page-

    Disallow: -cat-1$
    Disallow:
    -cat-2$
    Disallow: -cat-3$
    Disallow:
    -cat-4$
    Disallow: -cat-5$
    Disallow:
    -cat-6$
    Disallow: -cat-7$
    Disallow:
    -cat-8$
    Disallow: -cat-9$
    Disallow:
    -cat-10$
    Disallow: -cat-11$
    Disallow:
    -cat-12$
    Disallow: -cat-13$
    Disallow:
    -cat-14$
    Disallow: -cat-15$
    Disallow:
    -cat-16$
    Disallow: -cat-18$
    Disallow:
    -cat-19$
    Disallow: -cat-20$
    Disallow:
    -cat-21$
    Disallow: -cat-22$
    Disallow:
    -cat-23$
    Disallow: -cat-24$
    Disallow:
    -cat-25$
    Disallow: *-cat-26$

    Disallow: /directory
    Disallow: /director
    Disallow: /annuaire

    Disallow: /2008/
    Disallow: /2009/

    Disallow: /2010/
    Disallow: /2011/

    Disallow: /rate-recipes

    Disallow: /tags/recettes/-page-

    User-agent: Mediapartners-Google
    Allow: /

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.