iNSiPiD - 2007-11-08

Hi guys,

I've just been through the painful process of creating a urlalias.[site].txt file from an existing history log and, although this no doubt saved me a lot of time, it was also fraught with problems.

Firstly, there was the usual problem of having hundreds of URLs in the original log file that weren't actually pages. I would have thought the .pdf extension on these would have been a clear enough indicator.

Secondly, the list is incomplete. I know I could just use the -site= param but I tried that in the first instance which is when I noticed all the invalid page URLs getting hit.

Thirdly, pages requiring login were not indexed, despite being logged in already on the same machine.

So, some questions:

  1. Is there a way to filter this list with multiple arguments as it's getting built?
  2. Given that this still leaves me with a bunch of untitled pages, is there a way to exclude ceetain URL paths from the Page hits stats only? also using multiple arguments.
  3. Could a user/pass param be added to urlalias.pl for the purpose of accessing protected pages?
  4. Why, oh why, can't AWStats (given that you've set the option to 1 in your config file) update this list for you with each new log update, or by some other method?? It would certainly put less strain on the server than trying to perform it across the whole site or log file every time.

All replies are greatly accepted.

Thanks in advance.