No, im sorry, phpcrawl doesn't support robots.txt files.
But it shouldn't be too difficult to implement a little parser
yourself i think.
Check if a robots.txt file exists, look for "nofollow"-lines in
there an just pass the found nofollow-path(es) to the setup-method
"addNonFollowMatch()" of the crawler.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not sure if I missed it. Does PHPCrawl supports ROBOTS.TXT files and the NOFOLLOW rules for the pages? Please refer to the paragraph B.4.1 Search robots: http://www.w3.org/TR/html401/appendix/notes.html#recs
Hi!
No, im sorry, phpcrawl doesn't support robots.txt files.
But it shouldn't be too difficult to implement a little parser
yourself i think.
Check if a robots.txt file exists, look for "nofollow"-lines in
there an just pass the found nofollow-path(es) to the setup-method
"addNonFollowMatch()" of the crawler.
Alright, thanks. I see you release new version every year. Would there be a new release this year?
Yes, i think so.
Its almost done (v0.7).
If you want to, you can have a look at it:
svn://88.198.0.9/phpcrawl