Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


#277 htdig 3.1.6 AND 3.2.0b6 both crash when run in cron

htdig (103)

my e-mail address is iss@kineticsusa.com

I cannot seem to figure out why.

We're running Fedora Core 4. I have tried both htdig
3.1.6 and also htdig 3.2.0b6 and both will dig our
entire site correctly when run from the command line.

The site they're digging has around 11gb of files (pdf,
doc, rtf, html, txt, and xls). Everything works fine
when run from the command line, but when I run from
cron (simply running /usr/bin/rundig or /usr/bin/rundig
-v in a cronjob), I get bad behavior which is different
depending on whether I'm running 3.2.0b6 or 3.1.6.

On 3.1.6 the htdig process just stops and eats up 100%
cpu usage forever, never does anything else (about 5%
into digging the server). If I use -v and then kill
htdig and rundig, it stops at the same file every
single time, a .doc file. I can then delete that file,
and it will stop on the next file, and so on and so
forth - it always stops after the same amount of data
has been processed). If I run from a command line it
never has any such problems and just keeps running
until its done.

With 3.2.0b6 when running from cron it gets to the
exact same spot, only instead of going to 100% cpu
usage forever, the htdig process actually "kills" and
then htpurge runs, along with everything else, and I
end up with a database thats only 5% as big.

Again, running from the command line works just fine.

I've also tried doing an echo $PATH from the command
line and then hard coding that path into the rundig
script to eliminate any path issues, which didn't help.

Just as an FYI our system has plenty of everything -
700GB of free HD space, 2GB of memory with 1GB of swap,
and a Hyperthreaded 3.2GHz Processor. We're running a
RAID5 for our hard drives (software controlled in linux).

I can't figure out why this is happening any ideas?