On Linux Mandrake 2006 - kernel 2.6.12-12mdk
with htdig 3.2.0b6:
since htdig 3.1.6 has words.db file size limitation
(4GO max), i tried with htdig 3.2.0b6.
My files (about 300000, for about 30 GO) are
constitued of:
- doc, pdf, rtf, html, txt files
- lot of zip files (more than 280000) : a script
unzip them and parse the contents which contains files
format said above.
when zip files are unzipped there are about 700000
files for a size of about 100 GO
htdig eats quickly (after 12000 files) all the memory
(768 MO RAM , and swap 800 MO swap....) until linux
show theses messages:
p1011761 kernel: Bad page state at prep_new_page (in
process 'htdig', page c11aca80)
Message from syslogd@p1011761 at Fri Jan 20 10:01:30
2006 ...
p1011761 kernel: flags:0x2000000c mapping:00000000
mapcount:1 count:1
Message from syslogd@p1011761 at Fri Jan 20 10:01:30
2006 ...
p1011761 kernel: Backtrace:
Message from syslogd@p1011761 at Fri Jan 20 10:01:30
2006 ...
p1011761 kernel: Trying to fix it up, but a reboot is
needed
---
the htdig process stays in memory,, but does nothing
more. top shows that:
root 16 0 1383m 577m 1852 S 0.0 77.0
17:55.31 htdig
so it looks like neither 3.1.6 nor 3.2.b06 can index
my files...
Logged In: YES
user_id=799288
oups ... i though that it was 3.2.0b6 ...but it is
3.2.0b4...sorry, forgot my last message...
i will installe 3.2.0b6.
Logged In: YES
user_id=799288
Well, now i have really used 3.2.0b6.....
And: there are always these memory leaks, even if htdig
eats memory more slowly...
Swap: 2562800k total, 159528k used, 2403272k free,
36628k cached
VIRT RES SHR S %CPU %MEM TIME+ SWAP nFLT nDRT
COMMAND
here is what the top command says for the htdig process:
VIRT RES SHR S %CPU %MEM TIME+ SWAP nFLT nDRT
COMMAND
1380m 642m 1620 S 0.0 85.7 12:47.98 738m 1419 0
<< /opt/www/htdig-3.2.0b6/bin/htdig
nFTL is " Page Fault count"
First, htdig take , at laucnh 1,2 GO memory (76 mo RAM
resident, about 1.1 GO swap): it is showed at the process
line in the top tool. It is surprising, because i have
erased all the databases files, so it hasn't big files to
open. What is strange more, is that the total swap use of
all process, always showed by top, at toplevel, is
then ...0 is this a kernel bug?
But if the resident part of memory used at the beginning is
only 76mo, it grows progressively...until it uses all the
RAM, and then the total swap showed by top grows too..
i am obliged to periodicaly kill htdig, before too much
swap is used, to let the rundig script call the others
process (htpurge, merge etc..) so that the database is not
corrupted, and relaunch the rundig process again...
Any ideas?