The rule obeyRobotsTxt(true) don't know all directives from robots.txt
I have in robots.txt:
Disallow: *?obj=*
And url like "/index.php?obj=front& ..." are crawled...
My rules:
$crawler->addContentTypeReceiveRule('#text/html#'); $crawler->addURLFilterRule('#\.(jpg|jpeg|gif|png|js|swf|xml|ico)$#i'); $crawler->addURLFilterRule('#\/upload_data\/#i'); $crawler->obeyNoFollowTags(true); $crawler->obeyRobotsTxt(true);
Anonymous
You seem to have CSS turned off. Please don't fill out this field.
Hi!
Could you please post the URL of the site where this occurs?
Thanks!
This is the robots.txt file:
User-agent: MSNbot Crawl-delay: 2
User-agent: Disallow: /index.php?obj=feed&action=print& Disallow: /shopping/ Disallow: -page-page- Disallow: ?obj= Disallow: ?ft_term=*
Disallow: -uid-
Disallow: -rid-1-page- Disallow: -rid-2-page- Disallow: -rid-3-page- Disallow: -rid-4-page-
Disallow: -cat-1$ Disallow: -cat-2$ Disallow: -cat-3$ Disallow: -cat-4$ Disallow: -cat-5$ Disallow: -cat-6$ Disallow: -cat-7$ Disallow: -cat-8$ Disallow: -cat-9$ Disallow: -cat-10$ Disallow: -cat-11$ Disallow: -cat-12$ Disallow: -cat-13$ Disallow: -cat-14$ Disallow: -cat-15$ Disallow: -cat-16$ Disallow: -cat-18$ Disallow: -cat-19$ Disallow: -cat-20$ Disallow: -cat-21$ Disallow: -cat-22$ Disallow: -cat-23$ Disallow: -cat-24$ Disallow: -cat-25$ Disallow: *-cat-26$
Disallow: /directory Disallow: /director Disallow: /annuaire
Disallow: /2008/ Disallow: /2009/ Disallow: /2010/ Disallow: /2011/
Disallow: /rate-recipes
Disallow: /tags/recettes/-page-
User-agent: Mediapartners-Google Allow: /
The rule obeyRobotsTxt(true) don't know all directives from robots.txt
I have in robots.txt:
Disallow: *?obj=*
And url like "/index.php?obj=front& ..." are crawled...
My rules:
View and moderate all "Help" comments posted by this user
Mark all as spam, and block user from posting to "Forum"
Hi!
Could you please post the URL of the site where this occurs?
Thanks!
This is the robots.txt file:
User-agent: MSNbot
Crawl-delay: 2
User-agent:
Disallow: /index.php?obj=feed&action=print&
Disallow: /shopping/
Disallow: -page-page-
Disallow: ?obj=
Disallow: ?ft_term=*
Disallow: -uid-
Disallow: -rid-1-page-
Disallow: -rid-2-page-
Disallow: -rid-3-page-
Disallow: -rid-4-page-
Disallow: -cat-1$
Disallow: -cat-2$
Disallow: -cat-3$
Disallow: -cat-4$
Disallow: -cat-5$
Disallow: -cat-6$
Disallow: -cat-7$
Disallow: -cat-8$
Disallow: -cat-9$
Disallow: -cat-10$
Disallow: -cat-11$
Disallow: -cat-12$
Disallow: -cat-13$
Disallow: -cat-14$
Disallow: -cat-15$
Disallow: -cat-16$
Disallow: -cat-18$
Disallow: -cat-19$
Disallow: -cat-20$
Disallow: -cat-21$
Disallow: -cat-22$
Disallow: -cat-23$
Disallow: -cat-24$
Disallow: -cat-25$
Disallow: *-cat-26$
Disallow: /directory
Disallow: /director
Disallow: /annuaire
Disallow: /2008/
Disallow: /2009/
Disallow: /2010/
Disallow: /2011/
Disallow: /rate-recipes
Disallow: /tags/recettes/-page-
User-agent: Mediapartners-Google
Allow: /