Re: [htdig-dev] Re: Logical Error in Indexer???

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ciao guys,

>   Nope, if head_before_get=TRUE we use the HEAD request and the HTTP
>server is kind enough to give us the timestamp on the document in the header.
>If the timestamps are the same we don't bother to download it.

Yep, you are right. I remember that was one of the reasons why I wrote the 
code for the 'HEAD' method (also to avoid downloading an entire document 
whose content-type is not parsable).

> > I think you misinterpreted what Lachlan suggested, i.e. the case where Y
> > does NOT change.  If Y is the only document with a link to X, and Y does
> > not change, it will still have the link to X, so X is still "valid".
> > However, if Y didn't change, and htdig (without -i) doesn't reindex Y,
> > then how will it find the link to X to validate X's presence in the db?

I must admit I am not very confortable with the incremental indexing code. 
Anyway, when I was thinking of the same procedure for ht://Check (not yet 
done, as I said) I came up to this (I will try to stay on a logical level):

1) As you said, mark all the document as let's say 'Reference_obsolete'
2) Read the start URL and mark all the URLs in the start URL to be 
retrieved (eventually add them in the index of documents)
3) Loop until there are no URLs to be retrieved

4) For every URL, through a pre-emptive head call, get to know if it is 
changed:
         a) not changed: let's get all the URLs linked to it and mark them 
"to be retrieved" or something like that
         b) yes: let's download it again and mark all the new link as "to 
be retrieved"

5) Purge all the obsolete URLs

This approach would solve your second "flaw" Neal (I guess so).

Ciao ciao
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check 
maintainer
Current Location: Melbourne, Victoria, Australia
bar...@in... | http://www.prato.linux.it/~gbartolini | ICQ#129221447
 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The 
Inferno