Menu

#271 htdig 3.20b6 memory leak

open
nobody
None
5
2006-01-20
2006-01-20
No

On Linux Mandrake 2006 - kernel 2.6.12-12mdk
with htdig 3.2.0b6:

since htdig 3.1.6 has words.db file size limitation
(4GO max), i tried with htdig 3.2.0b6.

My files (about 300000, for about 30 GO) are
constitued of:
- doc, pdf, rtf, html, txt files
- lot of zip files (more than 280000) : a script
unzip them and parse the contents which contains files
format said above.

when zip files are unzipped there are about 700000
files for a size of about 100 GO

htdig eats quickly (after 12000 files) all the memory
(768 MO RAM , and swap 800 MO swap....) until linux
show theses messages:

p1011761 kernel: Bad page state at prep_new_page (in
process 'htdig', page c11aca80)

Message from syslogd@p1011761 at Fri Jan 20 10:01:30
2006 ...
p1011761 kernel: flags:0x2000000c mapping:00000000
mapcount:1 count:1

Message from syslogd@p1011761 at Fri Jan 20 10:01:30
2006 ...
p1011761 kernel: Backtrace:

Message from syslogd@p1011761 at Fri Jan 20 10:01:30
2006 ...
p1011761 kernel: Trying to fix it up, but a reboot is
needed

---

the htdig process stays in memory,, but does nothing
more. top shows that:

root 16 0 1383m 577m 1852 S 0.0 77.0
17:55.31 htdig

so it looks like neither 3.1.6 nor 3.2.b06 can index
my files...

Discussion

  • Gaetan QUENTIN

    Gaetan QUENTIN - 2006-01-20

    Logged In: YES
    user_id=799288

    oups ... i though that it was 3.2.0b6 ...but it is
    3.2.0b4...sorry, forgot my last message...

    i will installe 3.2.0b6.

     
  • Gaetan QUENTIN

    Gaetan QUENTIN - 2006-01-20

    Logged In: YES
    user_id=799288

    Well, now i have really used 3.2.0b6.....
    And: there are always these memory leaks, even if htdig
    eats memory more slowly...

    Swap: 2562800k total, 159528k used, 2403272k free,
    36628k cached

    VIRT RES SHR S %CPU %MEM TIME+ SWAP nFLT nDRT
    COMMAND

    here is what the top command says for the htdig process:

    VIRT RES SHR S %CPU %MEM TIME+ SWAP nFLT nDRT
    COMMAND

    1380m 642m 1620 S 0.0 85.7 12:47.98 738m 1419 0
    << /opt/www/htdig-3.2.0b6/bin/htdig

    nFTL is " Page Fault count"

    First, htdig take , at laucnh 1,2 GO memory (76 mo RAM
    resident, about 1.1 GO swap): it is showed at the process
    line in the top tool. It is surprising, because i have
    erased all the databases files, so it hasn't big files to
    open. What is strange more, is that the total swap use of
    all process, always showed by top, at toplevel, is
    then ...0 is this a kernel bug?

    But if the resident part of memory used at the beginning is
    only 76mo, it grows progressively...until it uses all the
    RAM, and then the total swap showed by top grows too..

    i am obliged to periodicaly kill htdig, before too much
    swap is used, to let the rundig script call the others
    process (htpurge, merge etc..) so that the database is not
    corrupted, and relaunch the rundig process again...

    Any ideas?

     

Log in to post a comment.