|
From: Graham M. <ga...@la...> - 2009-04-14 20:31:37
|
I'm trying to track down the cause of a seg fault in my MPI program.
Here's what happens with valgrind.
% mpirun -n 27 ../../valgrind-3.4.1/bin/valgrind --kernel-
variant=bproc mc3_op
ERROR: ld.so: object '/usr/projects/dark/gam/valgrind/amd64-linux/
libmpiwrap.so' from LD_PRELOAD cannot be preloaded: ignored.
[many other errors reported: conditional jump or move...;
uninitialized data; all trace to library routines]
[program nearly completes and then:]
--30441-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11
(SIGSEGV) - exiting
--30441-- si_code=80; Faulting address: 0x0; sp: 0x403087d60
valgrind: the 'impossible' happened:
Killed by fatal signal
==30441== at 0x380323E0: vgPlain_arena_free (m_mallocfree.c:218)
==30441== by 0x38001B17: die_and_free_mem (mc_malloc_wrappers.c:123)
==30441== by 0x3804AEB4: vgPlain_scheduler (scheduler.c:1303)
==30441== by 0x3804D44E: run_a_thread_NORETURN (syswrap-linux.c:89)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable
==30441== at 0x4A1A69C: operator delete[](void*)
(vg_replace_malloc.c:364)
==30441== by 0x4153B5: Particles::~Particles() (Particles.cxx:60)
==30441== by 0x40A7B8: main (mc3.cxx:167)
Note: see also the FAQ.txt in the source distribution.
It contains workarounds to several common problems.
If that doesn't help, please report this bug to: www.valgrind.org
In the bug report, send all the above text, the valgrind
version, and what Linux distro you are using. Thanks.
% ../../valgrind/bin/valgrind --version
valgrind-3.4.1
% uname -a
Linux cy-1.lanl.gov 2.6.14-8.BProcPerfctr_FC3smp #1 SMP Mon Oct 1
15:20:42 MDT 2007 x86_64 x86_64 x86_64 GNU/Linux
==========================
Graham Mark
CCS-3
Information Sciences
Los Alamos National Laboratory
505-667-8147
|
|
From: Nicholas N. <n.n...@gm...> - 2009-04-14 20:41:39
|
On Wed, Apr 15, 2009 at 6:31 AM, Graham Mark <ga...@la...> wrote: > > I'm trying to track down the cause of a seg fault in my MPI program. > Here's what happens with valgrind. > > % mpirun -n 27 ../../valgrind-3.4.1/bin/valgrind --kernel- > variant=bproc mc3_op > ERROR: ld.so: object '/usr/projects/dark/gam/valgrind/amd64-linux/ > libmpiwrap.so' from LD_PRELOAD cannot be preloaded: ignored. > [many other errors reported: conditional jump or move...; > uninitialized data; all trace to library routines] > [program nearly completes and then:] > > --30441-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 > (SIGSEGV) - exiting > --30441-- si_code=80; Faulting address: 0x0; sp: 0x403087d60 > > valgrind: the 'impossible' happened: > Killed by fatal signal > ==30441== at 0x380323E0: vgPlain_arena_free (m_mallocfree.c:218) > ==30441== by 0x38001B17: die_and_free_mem (mc_malloc_wrappers.c:123) > ==30441== by 0x3804AEB4: vgPlain_scheduler (scheduler.c:1303) > ==30441== by 0x3804D44E: run_a_thread_NORETURN (syswrap-linux.c:89) It's almost certainly caused by heap corruption due to the errors Valgrind reports about your program. If you fix those, this crash will almost certainly go away. Nick |
|
From: jody <jod...@gm...> - 2009-04-15 06:00:16
|
Hi graham Are you sure that all libraries needed by valgrind are accessible on all your nodes? If it is Open MPIy<ou're using, make sure that every node has an LD_LIBRARY_PATH pointing to the appropriate directories. Jody On Tue, Apr 14, 2009 at 10:41 PM, Nicholas Nethercote <n.n...@gm...> wrote: > On Wed, Apr 15, 2009 at 6:31 AM, Graham Mark <ga...@la...> wrote: >> >> I'm trying to track down the cause of a seg fault in my MPI program. >> Here's what happens with valgrind. >> >> % mpirun -n 27 ../../valgrind-3.4.1/bin/valgrind --kernel- >> variant=bproc mc3_op >> ERROR: ld.so: object '/usr/projects/dark/gam/valgrind/amd64-linux/ >> libmpiwrap.so' from LD_PRELOAD cannot be preloaded: ignored. >> [many other errors reported: conditional jump or move...; >> uninitialized data; all trace to library routines] >> [program nearly completes and then:] >> >> --30441-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 >> (SIGSEGV) - exiting >> --30441-- si_code=80; Faulting address: 0x0; sp: 0x403087d60 >> >> valgrind: the 'impossible' happened: >> Killed by fatal signal >> ==30441== at 0x380323E0: vgPlain_arena_free (m_mallocfree.c:218) >> ==30441== by 0x38001B17: die_and_free_mem (mc_malloc_wrappers.c:123) >> ==30441== by 0x3804AEB4: vgPlain_scheduler (scheduler.c:1303) >> ==30441== by 0x3804D44E: run_a_thread_NORETURN (syswrap-linux.c:89) > > It's almost certainly caused by heap corruption due to the errors > Valgrind reports about your program. If you fix those, this crash > will almost certainly go away. > > Nick > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: Nicholas N. <n.n...@gm...> - 2009-04-16 00:36:21
|
On Wed, Apr 15, 2009 at 6:41 AM, Nicholas Nethercote <n.n...@gm...> wrote: > > It's almost certainly caused by heap corruption due to the errors > Valgrind reports about your program. If you fix those, this crash > will almost certainly go away. I just clarified the abort/crash message, making this possibility clearer. Hopefully we'll get fewer complaints about it in the future. Nick |