From: Gabriele B. <g.b...@co...> - 2003-04-18 13:15:04
|
Ciao Neal! > 1) The function reparses 'bad_extensions' & 'valid_extensions' each time > through. This seems wastefull. And good reason to do this? As Geoff pointed out, we need to check this for the block feature. However, an optimized version of this structure would be good. But ... any hints on how to make this? > 2) Toward the end of the function, just before we test the URL against > 'limits' & 'limit_normalized', we check the server's robots.txt file. > Wouldn't it make sense to do the robots.txt check AFTER the limits > check, so as not to waste network connections on servers that will get > rejected by the next two tests? If I am not wrong, the robots txt is not retrieved at this stage, but after the URL is considered to be valid according to our 'limits'.=20 Indeed, the robots.txt file is retrieved in the server class' constructor. Please correct me if I am wrong. Ciao -Gabriele P.S.: Happy Easter to everyone. --=20 Gabriele Bartolini - Web Programmer Comune di Prato - Prato - Tuscany - Italy g.b...@co... | http://www.comune.prato.it > find bin/laden -name osama -exec rm {} ; |