|
From: Joe R. J. <jj...@cl...> - 2001-10-14 08:26:39
|
On Wed, 3 Oct 2001, Gilles Detillieux wrote:
> Date: Wed, 3 Oct 2001 09:51:03 -0500 (CDT)
> From: Gilles Detillieux <gr...@sc...>
> To: Joe R. Jah <jj...@cl...>
> Cc: htd...@li...
> Subject: Re: [htdig-dev] Re: URL Rewrite patch for 3.1.6 snapshots
>
> > > > > If you get a chance to run old and new snapshots of htdig with -vvv and
> > > > > compare the outputs, you may be able to track down the source of the
> > > > > different URLs that are parsed in both cases. To do this in a meaningful
> > > > > way, though, you'll need to try a static site, or perhaps a snapshot of
> > > > > your site, so you don't get thrown off in your comparisons by updates
> > > > > to the site between digs.
> > > >
> > > > Yes, I have kept that snapshot for a happy occasion like that;)
> > >
> > > Keep me posted if you get a chance to run this test with both snapshots.
> > > I can't think of any changes to 3.1.6 that would cause it to lose valid
> > > URLs, but it would be good to confirm without a doubt that the lost URLs
> > > on your system are all indeed URLs that should not have been indexed.
> >
> > In the happy hour;)))
>
> It might be best if you're sober when you do this test. ;-)
The happy hour turned into a couple of unhappy weeks:(
-r--r--r-- 1 jjah www 24621528 Oct 2 13:20 rundig_vvv.082901
-r--r--r-- 1 jjah www 20266702 Oct 2 14:15 rundig_vvv.093001
I found 82 links from one document with META ROBOT: Noindex tag;) I could
not find an efficient way of hunting down the other 138 links that were
unaccounted for in two 20 meg+ files; however, I must assume that they are
some sort of duplicates;-/
Regards,
Joe
--
_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah jj...@cl...
|