From: Joe R. J. <jj...@cl...> - 2001-10-14 08:26:39
|
On Wed, 3 Oct 2001, Gilles Detillieux wrote: > Date: Wed, 3 Oct 2001 09:51:03 -0500 (CDT) > From: Gilles Detillieux <gr...@sc...> > To: Joe R. Jah <jj...@cl...> > Cc: htd...@li... > Subject: Re: [htdig-dev] Re: URL Rewrite patch for 3.1.6 snapshots > > > > > > If you get a chance to run old and new snapshots of htdig with -vvv and > > > > > compare the outputs, you may be able to track down the source of the > > > > > different URLs that are parsed in both cases. To do this in a meaningful > > > > > way, though, you'll need to try a static site, or perhaps a snapshot of > > > > > your site, so you don't get thrown off in your comparisons by updates > > > > > to the site between digs. > > > > > > > > Yes, I have kept that snapshot for a happy occasion like that;) > > > > > > Keep me posted if you get a chance to run this test with both snapshots. > > > I can't think of any changes to 3.1.6 that would cause it to lose valid > > > URLs, but it would be good to confirm without a doubt that the lost URLs > > > on your system are all indeed URLs that should not have been indexed. > > > > In the happy hour;))) > > It might be best if you're sober when you do this test. ;-) The happy hour turned into a couple of unhappy weeks:( -r--r--r-- 1 jjah www 24621528 Oct 2 13:20 rundig_vvv.082901 -r--r--r-- 1 jjah www 20266702 Oct 2 14:15 rundig_vvv.093001 I found 82 links from one document with META ROBOT: Noindex tag;) I could not find an efficient way of hunting down the other 138 links that were unaccounted for in two 20 meg+ files; however, I must assume that they are some sort of duplicates;-/ Regards, Joe -- _/ _/_/_/ _/ ____________ __o _/ _/ _/ _/ ______________ _-\<,_ _/ _/ _/_/_/ _/ _/ ......(_)/ (_) _/_/ oe _/ _/. _/_/ ah jj...@cl... |