#186 htsearch 3.2 rejects records with modtime of 0

htsearch (60)

From Gilles:

Hey, guys. I ran into something wierd when I was
testing out the
allow_numbers changes last week, which I haven't been
quite able to
explain or track down in the code. Of the pages on my
site that I was
indexing, about a dozen of them were from a CGI script
that puts out a
Last-Modified header to set the date appropriately in
search results.
Because of a recent bug in the script, which I just
fixed last week,
it turns out that the Last-Modified headers were coming
out with no
date on them, so htdig was giving them a modtime of 0
(i.e. the epoch).
This is different behaviour than htdig 3.1.6, which
gave them the current
time instead. It may be that the 3.2 code should be
fixed to do likewise,
as it seems the more sensible behaviour.

However, that's not the wierd thing. What was odd is
that even though
these dozen or so web pages were definitely in the
database, and came
out into db.docs after an htdump (with a m:0 field),
htsearch would not
show these in search results. I looked at the code,
and the only thing
that I can see that would cause this is if the
startyear, startmonth or
startday input parameters were set, causing the
timet_startdate value
in Display.cc to be greater than 0. But I didn't set
these! I ran
htsearch from the command line, so I know I wasn't
passing it these
values as input parameters, and the config file I used
didn't define
these as attributes either.

I know the problem was the 0 modtime, because when I
fixed the CGI script
to return a proper Last-Modified header, the pages
showed up in htsearch,
with no other changes being made.

Does anyone know of anything else that might explain
this behaviour?
I'd start putting trace prints in htsearch to track
this down, but I have
too many high-priority things right now to spend much
time on ht://Dig
right away. htsearch -vvvv didn't give any indication
of what might be
going on - the URLs in question never even showed up at
all in the output.

I don't think I'd consider this a showstopper, but it
does seem odd that
htsearch rejects any modtime value at all when none of
those parameters
have been specified. This, coupled with the fact that
htdig will assign
a 0 modtime if it can't parse the Last-Modified header
(as opposed to a
missing Last-Modified header, which should be taken as
the current time
if I'm not mistaken), could lead to others having
similar problems.


  • Gilles Detillieux

    Logged In: YES

    Lachlan pointed out that the problem was the default for the
    startyear attribute was set to 1970. After he changed it to be
    empty, as it was in 3.1.6, this problem did indeed go away, as my
    testing confirmed.

  • Gilles Detillieux

    • status: open --> closed-fixed

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks