|
From: Nikolaus R. <nr...@tr...> - 2015-12-09 18:32:13
|
On 12/09/2015 10:00 AM, Nikolaus Rath wrote:
> Hi Philippe,
>
> I found that I can work around the problem of gdb failing to produce backtraces by compiling with -O0. Switching to -O1 or higher is enough to cause issues. I also experimented using dwarf-2, dwarf-3, or dwarf-3 debug information but that did not seem to matter.
>
> I tried to narrow down the problem with -O1, -gdwarf2, newer valgrind, and newer gdb:
>
> $ valgrind --tool=massif --vgdb-error=0 ../../Q2D/LamyRidge/src/model/LR_model
> ==4881== Massif, a heap profiler
> ==4881== Copyright (C) 2003-2013, and GNU GPL'd, by Nicholas Nethercote
> ==4881== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
> ==4881== Command: ../../Q2D/LamyRidge/src/model/LR_model
> [...]
>
> $ gdb ../../Q2D/LamyRidge/src/model/LR_model
> GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
> Copyright (C) 2014 Free Software Foundation, Inc.
> [...]
> (gdb) target remote | /usr/lib/valgrind/../../bin/vgdb --pid=4881
> Remote debugging using | /usr/lib/valgrind/../../bin/vgdb --pid=4881
> [...]
> (gdb) b taehdf5.f90:1936
> (gdb) c
> (gdb) c
> (gdb) b H5FL_reg_calloc
> (gdb) c
> Continuing.
>
>[...]
>
> So as far as I can tell, valgrind is getting the backtrace right. Is this correct?
>
> If so, I guess the only explanation is that I am not setting the breakpoint at the time where massif takes the snapshot?
Ok, I fell into a trap. I assumed that whatever causes gdb to hang when trying to print a backtrace also causes valgrind to produce wrong stacktraces. But that is not the case.
So, when compiling with -O1 -gdwarf-2, the valgrind and gdb backtraces agree. However, when compiling with -O3 -gdwarf-2, there is a difference:
Valgrind thinks:
(gdb) monitor v.info scheduler
[...]
Thread 1: status = VgTs_Runnable
==5489== at 0x1010750: H5FL_reg_calloc (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
==5489== by 0xFA64E9: H5A_create (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
==5489== by 0xFA0610: H5Acreate2 (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
==5489== by 0xF8F3BD: h5acreate_c_ (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
==5489== by 0xF897B6: h5a_mp_h5acreate_f_ (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
==5489== by 0xB99FC6: taehdf5_mp_h5append_data_double_0d_ (taehdf5.f90:1936)
==5489== by 0xB248E6: plot_m_mp_plots_ (plot_hdf5.f:144)
==5489== by 0xB3B722: lr_mod_m_mp_check_dt_ (LR_model.F:487)
==5489== by 0xB272E3: lr_mod_m_mp_lr_step_ (LR_model.F:252)
==5489== by 0xB261DD: MAIN__ (LR_model.F:544)
==5489== by 0x406E3D: main (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
client stack range: [0xFFEBFE000 0xFFF000FFF] client SP: 0xFFEC2CFC8
valgrind stack top usage: 12424 of 1048576
But gdb says:
(gdb) bt
#0 0x0000000001010750 in H5FL_reg_calloc ()
#1 0x0000000000fa64ea in H5A_create ()
#2 0x0000000000fa0611 in H5Acreate2 ()
#3 0x0000000000f8f3be in h5acreate_c_ ()
#4 0x0000000000f897b7 in h5a_mp_h5acreate_f_ ()
#5 0x0000000000b99fc7 in h5dump_attr_int (loc_id=<optimized out>, f=<optimized out>, name=...,
.tmp.NAME.len_V$1086=<optimized out>) at /home/nrath/Q2D/utils/src/taehdf5.f90:1936
#6 h5append_data_double_0d (group_id=1,
f=<error reading variable: Cannot access memory at address 0xa000008>, name=...,
.tmp.NAME.len_V$1cd8=272) at /home/nrath/Q2D/utils/src/taehdf5.f90:4193
#7 0x0000000000b248e7 in plot_m::plots (idt=1) at /home/nrath/Q2D/LamyRidge/src/model/plot_hdf5.f:144
#8 0x0000000000b3b723 in lr_mod_m::check_dt (idt=1)
at /home/nrath/Q2D/LamyRidge/src/model/LR_model.F:487
#9 0x0000000000b272e4 in lr_mod_m::lr_step (idt=1,
dt_r=<error reading variable: Cannot access memory at address 0xa000008>, t_r=0)
at /home/nrath/Q2D/LamyRidge/src/model/LR_model.F:252
#10 0x0000000000b261de in lr_model () at /home/nrath/Q2D/LamyRidge/src/model/LR_model.F:544
Interestingly enough, but stacktraces are incorrect: gdb is missing the call to taehdf5_mp_h5append_data_double_0d_, and valgrind is missing the call to h5dump_attr_int.
This is with valgrind 3.10.0 and gdb 7.7.1 (as above).
(I also tried compiling with just "-O3" (should be using dwarf-3), "-O3 -gdwarf-4", and just "-O2", but the stacktrace difference was there in every case).
Short of only using -O1 and -O0, is there a way to fix this?
Best,
-Nikolaus
|