|
From: Gilles D. <gr...@sc...> - 2002-09-17 20:55:31
|
According to Stefan Seiz:
> On 17.9.2002 20:21 Uhr, Geoff Hutchison <ghu...@ws...> wrote:
> > 2) Where do you determine that ref->DocTime() is returning 0?
> > I ask, in part because htdump is going to access this as well:
> > fprintf(fl, "\tm:%d", (int) ref->DocTime());
> I added a traceprint to Retreiver.cc inside the Retriever::parse_url(URLRef
> &urlRef) routine like so:
>
> --- snip ---
> if (ref)
> {
> //
> // We already have an entry for this document in our database.
> // This means we can get the document ID and last modification
> // time from there.
> //
> current_id = ref->DocID();
> date = ref->DocTime();
> if (debug > 2)
> {
> cout << "\nDOC MATCHED DB!!! \n" << endl;
> cout << "DocTime Date is: " << date << endl;
> }
> --- snap ---
I think you might need a cast up there, i.e.:
cout << "DocTime Date is: " << (int) date << endl;
I don't know if there's a (ostream) << (time_t) operator defined, and it
might not be automatically casting date to (int) on its own. See if that
makes a difference.
> > 3) Are you sure the server is returning a Last-Modified header for files?
> Yes, I snooped the wire ;-)
>
> > 4) Does the server properly handle the If-Modified-Since header?
> > (To see that this header is sent, check in Document.cc line 525 or so for
> > the output sent by htdig.)
> It's apache 1.3.26, so I guess it should. But I think htdig only sends the
> if-Modified since header if it finds a date for an url in the current
> database and as I assume that doesn't happen, so the If-Modified-Since
> header never makes it's way out.
>
> Here's an example url from my htdump file to prove a date is in there:
>
> 0 u:http://www.CENSORED.com/YADDA.html t:CENSORED a:0
> m:873819058 s:280 H: CENSORED h: l:1031854616
> L:0 b:1 c:0 g:0 e: n: S: d: A:
Well, that "m:" value is definitely non-zero, and definitely not as large
as the current time, so it does seem to be getting, parsing, and storing
Last-Modified header dates. But in your reply to my message on [htdig],
you said...
According to Stefan Seiz:
> On 17.9.2002 20:23 Uhr, Gilles Detillieux <gr...@sc...> wrote:
> > If that's the part you suspect is failing, then you should be able
> > to confirm that by running htdig -vvv. Look for the messages where
> > it outputs the Last-Modified header, and then says something like
> > "Converted ... to ...", which shows the original and regenerated date
> > string after parsing. If the second one is wrong, then you are right in
> > that the problem is somewhere in the parsing. In that case, try adding
> > trace prints in the parsedate() function in htdig/Document.cc (minimal
> > programming skills required, just look at how other debug output is done).
>
> I already tested this but unfortunately I don't get any Dates output when
> running with -vvv
That doesn't add up. If you are indeed running htdig version 3.1.6
with -vvv, then it MUST be showing the Last-Modified headers if your
server is returning them. Are you sure you're running a vanilla 3.1.6
installation, and not some severely modified variant of this, or another
version altogether? Can you show us a complete excerpt of htdig -vvv
output for one file, from one "Retrieval command for ..." message to
the next?
If you're not seeing those either, is it possible you're retrieving
via local_urls? In this case, there's not date parsing involved, as
htdig gets the modtime as a time_t already from the local filesystem.
But you did say you snooped the wire, so I'd guess this isn't the case.
> > If the second date string is fine, it could be a problem related to
> > refetching this info from the database, or some memory leak somewhere.
> > I thought you mentioned that an htdump showed correct, non-zero modtimes.
> > Such a problem would be harder to track down.
>
> Yes, datestamps (seconds since epoch) are in the dumped file.
>
> I'll add debug prints (already did some and always got a date of 0) to the
> files you mentioned and report back.
>
> Could you tell me which subroutine is responsible for parsing the timestamp
> from the local database (I guess reading and parsing that one is the
> problem)?
It's already parsed by the time it gets into the database. It starts out
as a date string from the server, in RFC850 or RFC1123 format, and the
parsedate() function in htdig/Document.cc converts that to a time_t, i.e.
a 32-bit integer representing seconds since Jan 1, 1970, 00:00:00 GMT.
It goes through some encoding and decoding as it gets stored in the
database (see DocumentRef::Serialize() and DocumentRef::Deserialize in
htcommon/DocumentRef.cc), but then it goes through those same routines
when you get the number via htdump. It doesn't get converted back into
a date string until htsearch processes it, using strftime().
--
Gilles R. Detillieux E-mail: <gr...@sc...>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)
|