PHPCrawl / Forum / Help: ROBOTS.TXT, nofollow, etc.

pgolovko - 2006-10-22

I'm not sure if I missed it. Does PHPCrawl supports ROBOTS.TXT files and the NOFOLLOW rules for the pages? Please refer to the paragraph B.4.1 Search robots: http://www.w3.org/TR/html401/appendix/notes.html#recs

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Uwe Hunfeld - 2006-10-24
  
  Hi!
  
  No, im sorry, phpcrawl doesn't support robots.txt files.
  But it shouldn't be too difficult to implement a little parser
  yourself i think.
  Check if a robots.txt file exists, look for "nofollow"-lines in
  there an just pass the found nofollow-path(es) to the setup-method
  "addNonFollowMatch()" of the crawler.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
- pgolovko - 2006-10-24
  
  Alright, thanks. I see you release new version every year. Would there be a new release this year?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
- Uwe Hunfeld - 2006-10-25
  
  Yes, i think so.
  Its almost done (v0.7).
  If you want to, you can have a look at it:
  
  svn://88.198.0.9/phpcrawl
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

ROBOTS.TXT, nofollow, etc.

Forums

Help

ROBOTS.TXT, nofollow, etc. document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

ROBOTS.TXT, nofollow, etc.