|
From: Geoff H. <ghu...@ws...> - 2003-04-17 18:15:20
|
> 1) The function reparses 'bad_extensions' & 'valid_extensions' each time > through. This seems wastefull. And good reason to do this? Depends. Once upon a time, we thought that these should be configurable on a per-URL basis. (Which is why they're "reparsed.") Now maybe it's better to re-think this in terms of improved performance? > 2) Toward the end of the function, just before we test the URL against > 'limits' & 'limit_normalized', we check the server's robots.txt file. > Wouldn't it make sense to do the robots.txt check AFTER the limits > check, so as not to waste network connections on servers that will get > rejected by the next two tests? Good point. -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ |