Menu

#40 TODO: check manual if robots.txt / -limits use are detailed

CVS
open
Crawler (17)
5
2008-10-26
2008-10-26
Ger Hobbelt
No

See also the robots.txt bug

-------------
robots.txt will NOT be used to filter manually entered URLs, i.e. it does NOT apply to any URLs which you passed to pavuk using either the command line options (and scenario files) or GUI.

BTW, the same applies to the -limits settings (levels 2 and 3) and -dont_parse_... command line options: those only apply to 'collected' URLs, just like 'robots.txt'.
------------

We need to check the manual if it contains detailed documentation describing the above. I fear it's not all that detailed...

Discussion

  • Ger Hobbelt

    Ger Hobbelt - 2008-10-26
    • assigned_to: stoecker --> i_a
     
  • Ger Hobbelt

    Ger Hobbelt - 2008-10-30

    It wasn't in the manual: the options were listed, but this particular behaviour -- which may surprise others as it did me when I was a little less sharp -- is not in there. It is being added now.

    ... maybe I should introduce a separate section for all the CL0/CL1/CL2/CL3 limit/filter level details, as there's quite some wicked little details in there (all sensible, but still wicked).

     

Log in to post a comment.