Previous invocations of rundig have gone just fine. It
runs on a separate machine on the same subnet as the
Recently, the webserver was uhh... down for
maintenance, yeah, that's the ticket. And when rundig
ran, it couldn't connect to fetch the top-level page
and exited since there was nothing further to do.
And all the "empty" *.work files got copied to the
production files and the search results from that point
on always resulted in 0 items. Groan.
In summary, there are 3 problems:
1) htdig does not error out when it fails to fetch the
start_url. It should abort with an error message and a
non-zero exit code. Actually, ANY connection-type
errors (vs. http errors) should be fatal, since it
means that the database could be incomplete. It's
better to have day-old data than little or no data.
2) htdig does not return a zero at the end of the
main() routine, so the exit code on normal completion
is undefined and can't be checked in rundig.
3) rundig should check for errors (non-zero exit code)
from htdig and htpurge and *not* rename the *.work
files if errors occur.