nget / Bugs / #40 Nget v0.27.1 core dump after applying Reduced Memory patch

tom - 2006-04-28

core dump info

dump.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matthew Mueller - 2006-04-29

Logged In: YES
user_id=65253

Which version of the reduced memory patch did you apply?
Can you try it with the CVS version of nget? It has the
newest reduced memory patch merged, and will also just be
easier for me to help with.

Also, if you could in gdb run "bt full" that will provide
some more info that may be helpful.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-01

Logged In: YES
user_id=1222528

I'm a little confused about patch versions. So I have
appended the patch I used. I have now downloaded the CVS
version and will try it shortly. Here is the gdb session
with 'bt full' -- not much extra info I'm afraid.

Thanks
Tom

gdb nget -c core
GNU gdb 5.3
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are
welcome to change it and/or distribute copies of it under
certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show
warranty" for details.
This GDB was configured as "i386-slackware-linux"...
Core was generated by `/usr/local/bin/nget -t 1000 -s 60
--save-binary-info yes -g alt.binaries.mp3,al'.
Program terminated with signal 11, Segmentation fault.
Cannot access memory at address 0x4001471c
#0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
c_nrange, meta_mid_info*) (this=Cannot access memory at
address 0xbffff2f0
) at cache.cc:421
421 if ((sa->serverid ==
servinfo->serverid) && flushrange.check(sa->articlenum)){
(gdb) bt
#0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
c_nrange, meta_mid_info*) (this=Cannot access memory at
address 0xbffff2f0
) at cache.cc:421
Cannot access memory at address 0xbffff2e8
(gdb) bt full
#0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
c_nrange, meta_mid_info*) (this=Cannot access memory at
address 0xbffff2f0
) at cache.cc:421
countf = Cannot access memory at address 0xbffff2d4
(gdb) tom@mklab:~/.nget5$ exit

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-01

reduced memory patch file

nget-memcon-patch.diff.gz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-08

Logged In: YES
user_id=1222528

I have now reproduced the core dump with the CVS version but
omitted to remove the -O2 from the build. I'll try again
w/o -O2.

Tom.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-10

Logged In: YES
user_id=1222528

Here is the dump from the build w/o -O2. It is not really
much help. I am mystified by the lack of information. Is
the executable trashing its callframe when the sigsegv takes
place I wonder. I upgraded gdb to a very recent version (as
shown below) but the o/p is the same. The only thing I can
think of is that the lack of information is related to the
'ulimit -c 20000' that I have in force. Any suggestions on
how to proceed?

Thanks
Tom.

gdb /tmp/nget/nget -c core
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are
welcome to change it and/or distribute copies of it under
certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show
warranty" for details.
This GDB was configured as "i486-slackware-linux"...Using
host libthread_db library "/lib/libthread_db.so.1".

Core was generated by `/tmp/nget/nget -t 1000 -s 60
--save-binary-info yes -G alt.binaries.mp3,alt.bin'.
Program terminated with signal 11, Segmentation fault.
Cannot access memory at address 0x4001471c
#0 0x40237be4 in ?? ()
(gdb) bt full
#0 0x40237be4 in ?? ()
No symbol table info available.
Cannot access memory at address 0xbfffeed0
(gdb) list
978
print_help();
979
return 1;
980 }
981 }
982 }
983 }
984
985 int main(int argc, const char ** argv){
986 #ifdef HAVE_SETLINEBUF
987 setlinebuf(stdout); //force stdout to be
line buffered, useful if redirecting both stdout and err to
a file, to keep them from getting out of sync.
(gdb) q

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-12

gdb session on valid core dump

dump4.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-12

Logged In: YES
user_id=1222528

I have removed the corefile size limit and now have a valid
dump. Please see the attached file for details.

Thanks
Tom Crane

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matthew Mueller - 2006-05-14

Logged In: YES
user_id=65253

What is the command line you are running nget with? Is the
crash always the same or is it intermittant? Does it vary
if you use a different group/etc?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-15

command line

tmpcmd.tmp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-15

Logged In: YES
user_id=1222528

I always use the same command. It uses a fairly long regexp,
please see the attached file. Having only recently got a
valid dump I can't say if the sigsegv is always in the same
place however.

Thanks
Tom Crane

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-19

gdb session on valid core dump

dump5.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-19

Logged In: YES
user_id=1222528

Here is another core dump (see attachement). Please let me
know if there are any specific tests etc. I can do to help
diagnose this problem.

Thanks
Tom Crane

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matthew Mueller - 2006-05-20

Logged In: YES
user_id=65253

Hmm. Could you try the following things and see if any of
them also cause it to crash?

1) add -T to the start of the options
2) replace all the -r options with -r antoebuntbk (just
something that won't match anything)
3) replace all the -r options with -T -r .
4) replace all the -r options with -T -r "^(.*)$"

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-29

Logged In: YES
user_id=1222528

Here is a new dump analysis with -T as requested in 1).

Tom Crane

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-05-29

gdb session on valid core dump (ran with -T)

dump6.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-07-17

Logged In: YES
user_id=1222528

Update: I have been running with '-T ... -r antoebuntbk' as
requested in 2) for some weeks now but have not encountered
any crashes.

Tom Crane

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-07-21

Logged In: YES
user_id=1222528

Update: I am getting frequent (almost always) sigsegvs with
"-T ... -r ." as in suggestion 3) below. Please see the
attached file for the crash dump stack information via gdb.

Thanks
Tom Crane

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-07-21

gdb session on core dump (ran with "-T ... -r .")

dump7.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-07-24

Logged In: YES
user_id=1222528

Update: I am also getting sigsegvs with '-T ... -r "^(.*)$"'
as in the final suggestion, suggestion 4) below. Please see
the attached file for the crash dump stack information via gdb.

Thanks
Tom Crane

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-07-24

gdb session on core dump (ran with '-T ... -r "^(.*)$"' )

dump8.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frederick Bruckman - 2006-11-13

Logged In: YES
user_id=803104

This could be a bug in the vendor's "libc". It doesn't seem
to have anything to do with the reduced memory change. Tom,
does the problem go away if you install "libpcre" and
rebuild "nget"?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-11-15

Logged In: YES
user_id=1222528
Originator: YES

In reply to f2bruckman;
Hi,
I tried building nget w/ & w/o --with-pcre, w & w/o --enable-debug and w/ & w/o --enable-maintainer-mode; pretty much all combinations and got sigsegvs. I was running nget using;

/usr/bin/time -v /tmp/nget/nget

However, for the past few weeks I've been running nget directly from the shell (with pcre) and have had no crashes. Previously the behaviour was peculiar -- despite often getting signal 11's from within nget no core dump file was produced despite the fact that starting nget and then 'manually' giving a sigsegv (ie. kill -11 <pid of nget job>), under the same conditions, always produced a core dump. This made debugging difficult. When a core dump file was produced, the sigsegv was always inside a regexec call, called from myregex.h. It does appear the problem is due to some strange interaction between libc and /usr/bin/time.

Tom.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frederick Bruckman - 2006-11-15

Logged In: YES
user_id=803104
Originator: NO

You should know that random SIGSEV's usually indicate bad RAM. Sometimes they're consistent enough to mislead you into blaming a program, and it will usually be the compiler or some other program that uses a lot of memory, and nget is certainly that. Your problem really fits the profile. You might try running a stand-alone memory tester for a few hours to see what it finds (e.g. memtest86).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

tom - 2006-11-15

Logged In: YES
user_id=1222528
Originator: YES

The sigsegv's were never random -- they always occurred in the same place in nget. Moreover, the host machine is a busy one, running many of the standard Unix daemons, plus many cronjobs and desktop activities too. I've never had random sigsegv's in any of these or machine crashes . I am very skeptical that this ever was a DRAM problem.

Regards
Tom

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nget v0.27.1 core dump after applying Reduced Memory patch

Group

Searches

Help

#40 Nget v0.27.1 core dump after applying Reduced Memory patch

Discussion