#40 Nget v0.27.1 core dump after applying Reduced Memory patch

open
nobody
None
5
2006-04-28
2006-04-28
tom
No

Hello,
I am getting the following core dump (see attached file).

Please advise?
Thanks
Tom Crane

Discussion

  • tom

    tom - 2006-04-28

    core dump info

     
  • Matthew Mueller

    Matthew Mueller - 2006-04-29

    Logged In: YES
    user_id=65253

    Which version of the reduced memory patch did you apply?
    Can you try it with the CVS version of nget? It has the
    newest reduced memory patch merged, and will also just be
    easier for me to help with.

    Also, if you could in gdb run "bt full" that will provide
    some more info that may be helpful.

     
  • tom

    tom - 2006-05-01

    Logged In: YES
    user_id=1222528

    I'm a little confused about patch versions. So I have
    appended the patch I used. I have now downloaded the CVS
    version and will try it shortly. Here is the gdb session
    with 'bt full' -- not much extra info I'm afraid.

    Thanks
    Tom

    gdb nget -c core
    GNU gdb 5.3
    Copyright 2002 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public
    License, and you are
    welcome to change it and/or distribute copies of it under
    certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB. Type "show
    warranty" for details.
    This GDB was configured as "i386-slackware-linux"...
    Core was generated by `/usr/local/bin/nget -t 1000 -s 60
    --save-binary-info yes -g alt.binaries.mp3,al'.
    Program terminated with signal 11, Segmentation fault.
    Cannot access memory at address 0x4001471c
    #0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
    c_nrange, meta_mid_info*) (this=Cannot access memory at
    address 0xbffff2f0
    ) at cache.cc:421
    421 if ((sa->serverid ==
    servinfo->serverid) && flushrange.check(sa->articlenum)){
    (gdb) bt
    #0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
    c_nrange, meta_mid_info*) (this=Cannot access memory at
    address 0xbffff2f0
    ) at cache.cc:421
    Cannot access memory at address 0xbffff2e8
    (gdb) bt full
    #0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
    c_nrange, meta_mid_info*) (this=Cannot access memory at
    address 0xbffff2f0
    ) at cache.cc:421
    countf = Cannot access memory at address 0xbffff2d4
    (gdb) tom@mklab:~/.nget5$ exit

     
  • tom

    tom - 2006-05-01

    reduced memory patch file

     
  • tom

    tom - 2006-05-08

    Logged In: YES
    user_id=1222528

    I have now reproduced the core dump with the CVS version but
    omitted to remove the -O2 from the build. I'll try again
    w/o -O2.

    Tom.

     
  • tom

    tom - 2006-05-10

    Logged In: YES
    user_id=1222528

    Here is the dump from the build w/o -O2. It is not really
    much help. I am mystified by the lack of information. Is
    the executable trashing its callframe when the sigsegv takes
    place I wonder. I upgraded gdb to a very recent version (as
    shown below) but the o/p is the same. The only thing I can
    think of is that the lack of information is related to the
    'ulimit -c 20000' that I have in force. Any suggestions on
    how to proceed?

    Thanks
    Tom.

    gdb /tmp/nget/nget -c core
    GNU gdb 6.3
    Copyright 2004 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public
    License, and you are
    welcome to change it and/or distribute copies of it under
    certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB. Type "show
    warranty" for details.
    This GDB was configured as "i486-slackware-linux"...Using
    host libthread_db library "/lib/libthread_db.so.1".

    Core was generated by `/tmp/nget/nget -t 1000 -s 60
    --save-binary-info yes -G alt.binaries.mp3,alt.bin'.
    Program terminated with signal 11, Segmentation fault.
    Cannot access memory at address 0x4001471c
    #0 0x40237be4 in ?? ()
    (gdb) bt full
    #0 0x40237be4 in ?? ()
    No symbol table info available.
    Cannot access memory at address 0xbfffeed0
    (gdb) list
    978
    print_help();
    979
    return 1;
    980 }
    981 }
    982 }
    983 }
    984
    985 int main(int argc, const char ** argv){
    986 #ifdef HAVE_SETLINEBUF
    987 setlinebuf(stdout); //force stdout to be
    line buffered, useful if redirecting both stdout and err to
    a file, to keep them from getting out of sync.
    (gdb) q

     
  • tom

    tom - 2006-05-12

    gdb session on valid core dump

     
  • tom

    tom - 2006-05-12

    Logged In: YES
    user_id=1222528

    I have removed the corefile size limit and now have a valid
    dump. Please see the attached file for details.

    Thanks
    Tom Crane

     
  • Matthew Mueller

    Matthew Mueller - 2006-05-14

    Logged In: YES
    user_id=65253

    What is the command line you are running nget with? Is the
    crash always the same or is it intermittant? Does it vary
    if you use a different group/etc?

     
  • tom

    tom - 2006-05-15

    command line

     
  • tom

    tom - 2006-05-15

    Logged In: YES
    user_id=1222528

    I always use the same command. It uses a fairly long regexp,
    please see the attached file. Having only recently got a
    valid dump I can't say if the sigsegv is always in the same
    place however.

    Thanks
    Tom Crane

     
  • tom

    tom - 2006-05-19

    gdb session on valid core dump

     
  • tom

    tom - 2006-05-19

    Logged In: YES
    user_id=1222528

    Here is another core dump (see attachement). Please let me
    know if there are any specific tests etc. I can do to help
    diagnose this problem.

    Thanks
    Tom Crane

     
  • Matthew Mueller

    Matthew Mueller - 2006-05-20

    Logged In: YES
    user_id=65253

    Hmm. Could you try the following things and see if any of
    them also cause it to crash?

    1) add -T to the start of the options
    2) replace all the -r options with -r antoebuntbk (just
    something that won't match anything)
    3) replace all the -r options with -T -r .
    4) replace all the -r options with -T -r "^(.*)$"

     
  • tom

    tom - 2006-05-29

    Logged In: YES
    user_id=1222528

    Here is a new dump analysis with -T as requested in 1).

    Tom Crane

     
  • tom

    tom - 2006-05-29

    gdb session on valid core dump (ran with -T)

     
  • tom

    tom - 2006-07-17

    Logged In: YES
    user_id=1222528

    Update: I have been running with '-T ... -r antoebuntbk' as
    requested in 2) for some weeks now but have not encountered
    any crashes.

    Tom Crane

     
  • tom

    tom - 2006-07-21

    Logged In: YES
    user_id=1222528

    Update: I am getting frequent (almost always) sigsegvs with
    "-T ... -r ." as in suggestion 3) below. Please see the
    attached file for the crash dump stack information via gdb.

    Thanks
    Tom Crane

     
  • tom

    tom - 2006-07-21

    gdb session on core dump (ran with "-T ... -r .")

     
  • tom

    tom - 2006-07-24

    Logged In: YES
    user_id=1222528

    Update: I am also getting sigsegvs with '-T ... -r "^(.*)$"'
    as in the final suggestion, suggestion 4) below. Please see
    the attached file for the crash dump stack information via gdb.

    Thanks
    Tom Crane

     
  • tom

    tom - 2006-07-24

    gdb session on core dump (ran with '-T ... -r "^(.*)$"' )

     
  • Frederick Bruckman

    Logged In: YES
    user_id=803104

    This could be a bug in the vendor's "libc". It doesn't seem
    to have anything to do with the reduced memory change. Tom,
    does the problem go away if you install "libpcre" and
    rebuild "nget"?

     
  • tom

    tom - 2006-11-15

    Logged In: YES
    user_id=1222528
    Originator: YES

    In reply to f2bruckman;
    Hi,
    I tried building nget w/ & w/o --with-pcre, w & w/o --enable-debug and w/ & w/o --enable-maintainer-mode; pretty much all combinations and got sigsegvs. I was running nget using;

    /usr/bin/time -v /tmp/nget/nget

    However, for the past few weeks I've been running nget directly from the shell (with pcre) and have had no crashes. Previously the behaviour was peculiar -- despite often getting signal 11's from within nget no core dump file was produced despite the fact that starting nget and then 'manually' giving a sigsegv (ie. kill -11 <pid of nget job>), under the same conditions, always produced a core dump. This made debugging difficult. When a core dump file was produced, the sigsegv was always inside a regexec call, called from myregex.h. It does appear the problem is due to some strange interaction between libc and /usr/bin/time.

    Tom.

     
  • Frederick Bruckman

    Logged In: YES
    user_id=803104
    Originator: NO

    You should know that random SIGSEV's usually indicate bad RAM. Sometimes they're consistent enough to mislead you into blaming a program, and it will usually be the compiler or some other program that uses a lot of memory, and nget is certainly that. Your problem really fits the profile. You might try running a stand-alone memory tester for a few hours to see what it finds (e.g. memtest86).

     
  • tom

    tom - 2006-11-15

    Logged In: YES
    user_id=1222528
    Originator: YES

    The sigsegv's were never random -- they always occurred in the same place in nget. Moreover, the host machine is a busy one, running many of the standard Unix daemons, plus many cronjobs and desktop activities too. I've never had random sigsegv's in any of these or machine crashes . I am very skeptical that this ever was a DRAM problem.

    Regards
    Tom

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks