Menu

ROBOTS.TXT, nofollow, etc.

Help
pgolovko
2006-10-22
2013-04-09
  • pgolovko

    pgolovko - 2006-10-22

    I'm not sure if I missed it. Does PHPCrawl supports ROBOTS.TXT files and the NOFOLLOW rules for the pages? Please refer to the paragraph B.4.1 Search robots: http://www.w3.org/TR/html401/appendix/notes.html#recs

     
    • Uwe Hunfeld

      Uwe Hunfeld - 2006-10-24

      Hi!

      No, im sorry, phpcrawl doesn't support robots.txt files.
      But it shouldn't be too difficult to implement a little parser
      yourself i think.
      Check if a robots.txt file exists, look for "nofollow"-lines in
      there an just pass the found nofollow-path(es) to the setup-method
      "addNonFollowMatch()" of the crawler.

       
    • pgolovko

      pgolovko - 2006-10-24

      Alright, thanks. I see you release new version every year. Would there be a new release this year?

       
    • Uwe Hunfeld

      Uwe Hunfeld - 2006-10-25

      Yes, i think so.
      Its almost done (v0.7).
      If you want to, you can have a look at it:

      svn://88.198.0.9/phpcrawl

       

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.