Which version of the reduced memory patch did you apply?
Can you try it with the CVS version of nget? It has the
newest reduced memory patch merged, and will also just be
easier for me to help with.
Also, if you could in gdb run "bt full" that will provide
some more info that may be helpful.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm a little confused about patch versions. So I have
appended the patch I used. I have now downloaded the CVS
version and will try it shortly. Here is the gdb session
with 'bt full' -- not much extra info I'm afraid.
Thanks
Tom
gdb nget -c core
GNU gdb 5.3
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are
welcome to change it and/or distribute copies of it under
certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show
warranty" for details.
This GDB was configured as "i386-slackware-linux"...
Core was generated by `/usr/local/bin/nget -t 1000 -s 60
--save-binary-info yes -g alt.binaries.mp3,al'.
Program terminated with signal 11, Segmentation fault.
Cannot access memory at address 0x4001471c
#0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
c_nrange, meta_mid_info*) (this=Cannot access memory at
address 0xbffff2f0
) at cache.cc:421
421 if ((sa->serverid ==
servinfo->serverid) && flushrange.check(sa->articlenum)){
(gdb) bt
#0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
c_nrange, meta_mid_info*) (this=Cannot access memory at
address 0xbffff2f0
) at cache.cc:421
Cannot access memory at address 0xbffff2e8
(gdb) bt full
#0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
c_nrange, meta_mid_info*) (this=Cannot access memory at
address 0xbffff2f0
) at cache.cc:421
countf = Cannot access memory at address 0xbffff2d4
(gdb) tom@mklab:~/.nget5$ exit
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here is the dump from the build w/o -O2. It is not really
much help. I am mystified by the lack of information. Is
the executable trashing its callframe when the sigsegv takes
place I wonder. I upgraded gdb to a very recent version (as
shown below) but the o/p is the same. The only thing I can
think of is that the lack of information is related to the
'ulimit -c 20000' that I have in force. Any suggestions on
how to proceed?
Thanks
Tom.
gdb /tmp/nget/nget -c core
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are
welcome to change it and/or distribute copies of it under
certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show
warranty" for details.
This GDB was configured as "i486-slackware-linux"...Using
host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `/tmp/nget/nget -t 1000 -s 60
--save-binary-info yes -G alt.binaries.mp3,alt.bin'.
Program terminated with signal 11, Segmentation fault.
Cannot access memory at address 0x4001471c
#0 0x40237be4 in ?? ()
(gdb) bt full
#0 0x40237be4 in ?? ()
No symbol table info available.
Cannot access memory at address 0xbfffeed0
(gdb) list
978
print_help();
979
return 1;
980 }
981 }
982 }
983 }
984
985 int main(int argc, const char ** argv){
986 #ifdef HAVE_SETLINEBUF
987 setlinebuf(stdout); //force stdout to be
line buffered, useful if redirecting both stdout and err to
a file, to keep them from getting out of sync.
(gdb) q
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I always use the same command. It uses a fairly long regexp,
please see the attached file. Having only recently got a
valid dump I can't say if the sigsegv is always in the same
place however.
Thanks
Tom Crane
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hmm. Could you try the following things and see if any of
them also cause it to crash?
1) add -T to the start of the options
2) replace all the -r options with -r antoebuntbk (just
something that won't match anything)
3) replace all the -r options with -T -r .
4) replace all the -r options with -T -r "^(.*)$"
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Update: I am getting frequent (almost always) sigsegvs with
"-T ... -r ." as in suggestion 3) below. Please see the
attached file for the crash dump stack information via gdb.
Thanks
Tom Crane
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Update: I am also getting sigsegvs with '-T ... -r "^(.*)$"'
as in the final suggestion, suggestion 4) below. Please see
the attached file for the crash dump stack information via gdb.
Thanks
Tom Crane
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This could be a bug in the vendor's "libc". It doesn't seem
to have anything to do with the reduced memory change. Tom,
does the problem go away if you install "libpcre" and
rebuild "nget"?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In reply to f2bruckman;
Hi,
I tried building nget w/ & w/o --with-pcre, w & w/o --enable-debug and w/ & w/o --enable-maintainer-mode; pretty much all combinations and got sigsegvs. I was running nget using;
/usr/bin/time -v /tmp/nget/nget
However, for the past few weeks I've been running nget directly from the shell (with pcre) and have had no crashes. Previously the behaviour was peculiar -- despite often getting signal 11's from within nget no core dump file was produced despite the fact that starting nget and then 'manually' giving a sigsegv (ie. kill -11 <pid of nget job>), under the same conditions, always produced a core dump. This made debugging difficult. When a core dump file was produced, the sigsegv was always inside a regexec call, called from myregex.h. It does appear the problem is due to some strange interaction between libc and /usr/bin/time.
Tom.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You should know that random SIGSEV's usually indicate bad RAM. Sometimes they're consistent enough to mislead you into blaming a program, and it will usually be the compiler or some other program that uses a lot of memory, and nget is certainly that. Your problem really fits the profile. You might try running a stand-alone memory tester for a few hours to see what it finds (e.g. memtest86).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The sigsegv's were never random -- they always occurred in the same place in nget. Moreover, the host machine is a busy one, running many of the standard Unix daemons, plus many cronjobs and desktop activities too. I've never had random sigsegv's in any of these or machine crashes . I am very skeptical that this ever was a DRAM problem.
Regards
Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
core dump info
Logged In: YES
user_id=65253
Which version of the reduced memory patch did you apply?
Can you try it with the CVS version of nget? It has the
newest reduced memory patch merged, and will also just be
easier for me to help with.
Also, if you could in gdb run "bt full" that will provide
some more info that may be helpful.
Logged In: YES
user_id=1222528
I'm a little confused about patch versions. So I have
appended the patch I used. I have now downloaded the CVS
version and will try it shortly. Here is the gdb session
with 'bt full' -- not much extra info I'm afraid.
Thanks
Tom
gdb nget -c core
GNU gdb 5.3
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are
welcome to change it and/or distribute copies of it under
certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show
warranty" for details.
This GDB was configured as "i386-slackware-linux"...
Core was generated by `/usr/local/bin/nget -t 1000 -s 60
--save-binary-info yes -g alt.binaries.mp3,al'.
Program terminated with signal 11, Segmentation fault.
Cannot access memory at address 0x4001471c
#0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
c_nrange, meta_mid_info*) (this=Cannot access memory at
address 0xbffff2f0
) at cache.cc:421
421 if ((sa->serverid ==
servinfo->serverid) && flushrange.check(sa->articlenum)){
(gdb) bt
#0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
c_nrange, meta_mid_info*) (this=Cannot access memory at
address 0xbffff2f0
) at cache.cc:421
Cannot access memory at address 0xbffff2e8
(gdb) bt full
#0 0x080756a2 in c_nntp_cache::flush(c_nntp_server_info*,
c_nrange, meta_mid_info*) (this=Cannot access memory at
address 0xbffff2f0
) at cache.cc:421
countf = Cannot access memory at address 0xbffff2d4
(gdb) tom@mklab:~/.nget5$ exit
reduced memory patch file
Logged In: YES
user_id=1222528
I have now reproduced the core dump with the CVS version but
omitted to remove the -O2 from the build. I'll try again
w/o -O2.
Tom.
Logged In: YES
user_id=1222528
Here is the dump from the build w/o -O2. It is not really
much help. I am mystified by the lack of information. Is
the executable trashing its callframe when the sigsegv takes
place I wonder. I upgraded gdb to a very recent version (as
shown below) but the o/p is the same. The only thing I can
think of is that the lack of information is related to the
'ulimit -c 20000' that I have in force. Any suggestions on
how to proceed?
Thanks
Tom.
gdb /tmp/nget/nget -c core
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are
welcome to change it and/or distribute copies of it under
certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show
warranty" for details.
This GDB was configured as "i486-slackware-linux"...Using
host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `/tmp/nget/nget -t 1000 -s 60
--save-binary-info yes -G alt.binaries.mp3,alt.bin'.
Program terminated with signal 11, Segmentation fault.
Cannot access memory at address 0x4001471c
#0 0x40237be4 in ?? ()
(gdb) bt full
#0 0x40237be4 in ?? ()
No symbol table info available.
Cannot access memory at address 0xbfffeed0
(gdb) list
978
print_help();
979
return 1;
980 }
981 }
982 }
983 }
984
985 int main(int argc, const char ** argv){
986 #ifdef HAVE_SETLINEBUF
987 setlinebuf(stdout); //force stdout to be
line buffered, useful if redirecting both stdout and err to
a file, to keep them from getting out of sync.
(gdb) q
gdb session on valid core dump
Logged In: YES
user_id=1222528
I have removed the corefile size limit and now have a valid
dump. Please see the attached file for details.
Thanks
Tom Crane
Logged In: YES
user_id=65253
What is the command line you are running nget with? Is the
crash always the same or is it intermittant? Does it vary
if you use a different group/etc?
command line
Logged In: YES
user_id=1222528
I always use the same command. It uses a fairly long regexp,
please see the attached file. Having only recently got a
valid dump I can't say if the sigsegv is always in the same
place however.
Thanks
Tom Crane
gdb session on valid core dump
Logged In: YES
user_id=1222528
Here is another core dump (see attachement). Please let me
know if there are any specific tests etc. I can do to help
diagnose this problem.
Thanks
Tom Crane
Logged In: YES
user_id=65253
Hmm. Could you try the following things and see if any of
them also cause it to crash?
1) add -T to the start of the options
2) replace all the -r options with -r antoebuntbk (just
something that won't match anything)
3) replace all the -r options with -T -r .
4) replace all the -r options with -T -r "^(.*)$"
Logged In: YES
user_id=1222528
Here is a new dump analysis with -T as requested in 1).
Tom Crane
gdb session on valid core dump (ran with -T)
Logged In: YES
user_id=1222528
Update: I have been running with '-T ... -r antoebuntbk' as
requested in 2) for some weeks now but have not encountered
any crashes.
Tom Crane
Logged In: YES
user_id=1222528
Update: I am getting frequent (almost always) sigsegvs with
"-T ... -r ." as in suggestion 3) below. Please see the
attached file for the crash dump stack information via gdb.
Thanks
Tom Crane
gdb session on core dump (ran with "-T ... -r .")
Logged In: YES
user_id=1222528
Update: I am also getting sigsegvs with '-T ... -r "^(.*)$"'
as in the final suggestion, suggestion 4) below. Please see
the attached file for the crash dump stack information via gdb.
Thanks
Tom Crane
gdb session on core dump (ran with '-T ... -r "^(.*)$"' )
Logged In: YES
user_id=803104
This could be a bug in the vendor's "libc". It doesn't seem
to have anything to do with the reduced memory change. Tom,
does the problem go away if you install "libpcre" and
rebuild "nget"?
Logged In: YES
user_id=1222528
Originator: YES
In reply to f2bruckman;
Hi,
I tried building nget w/ & w/o --with-pcre, w & w/o --enable-debug and w/ & w/o --enable-maintainer-mode; pretty much all combinations and got sigsegvs. I was running nget using;
/usr/bin/time -v /tmp/nget/nget
However, for the past few weeks I've been running nget directly from the shell (with pcre) and have had no crashes. Previously the behaviour was peculiar -- despite often getting signal 11's from within nget no core dump file was produced despite the fact that starting nget and then 'manually' giving a sigsegv (ie. kill -11 <pid of nget job>), under the same conditions, always produced a core dump. This made debugging difficult. When a core dump file was produced, the sigsegv was always inside a regexec call, called from myregex.h. It does appear the problem is due to some strange interaction between libc and /usr/bin/time.
Tom.
Logged In: YES
user_id=803104
Originator: NO
You should know that random SIGSEV's usually indicate bad RAM. Sometimes they're consistent enough to mislead you into blaming a program, and it will usually be the compiler or some other program that uses a lot of memory, and nget is certainly that. Your problem really fits the profile. You might try running a stand-alone memory tester for a few hours to see what it finds (e.g. memtest86).
Logged In: YES
user_id=1222528
Originator: YES
The sigsegv's were never random -- they always occurred in the same place in nget. Moreover, the host machine is a busy one, running many of the standard Unix daemons, plus many cronjobs and desktop activities too. I've never had random sigsegv's in any of these or machine crashes . I am very skeptical that this ever was a DRAM problem.
Regards
Tom