|
From: James B. <ja...@ha...> - 2005-03-09 16:28:55
|
Hi, I downloaded, compiled and ran the 2.4.0-rc1 release through the regtest suite on this FC3 box - the following 2 tests failed (which don't seem to be a problem): == 200 tests, 2 stderr failures, 0 stdout failures ================= memcheck/tests/scalar (stderr) memcheck/tests/scalar_supp (stderr) ==== Using the new version of memcheck on my application works a treat. No problems there. However, when I use the new version of massif, I get the following: ==================================================================== ==19751== ==19751== Stack overflow in thread 0: can't grow stack to 0xE4E4E4F4 --19751-- INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --19751-- si_code=1 Fault EIP: 0xB7D74051; Faulting address: 0xE4E4E4F4 --19751-- esp=0xB0755F38 valgrind: the `impossible' happened: Killed by fatal signal Basic block ctr is approximately 503255235 ==19751== at 0xB7D74051: calc_exact_ST_dbld2 (ms_main.c:1269) sched status: running_tid=0 ==================================================================== ========================================= Note that the version of massif that came with the 2.2.0 release doesn't have a problem with my application. What is slightly confusing (at least to fools like me) is that my application finishes fine before generating the error - the last thing it does is print a 'finished' message. Any idea as to what is going on here? Cheers, James. James Begley -- Telephone: +354-575-2039. Marine Research Institute, Skulagata 4, P.O. Box 1390, 121 Reykjavik, Iceland. |
|
From: Nicholas N. <nj...@cs...> - 2005-03-11 01:34:59
|
On Thu, 10 Mar 2005, Jeremy Fitzhardinge wrote:
>> Some printfs show that its xpt_snapshot->xpt which has the bad value;
>> running massif under memcheck shows that it is uninitialized:
>>
>> ==29793== Conditional jump or move depends on uninitialised value(s)
>> ==29793== at 0x5125B038: calc_exact_ST_dbld2 (ms_main.c:1266)
>> ==29793==
>> ==29793== Use of uninitialised value of size 4
>> ==29793== at 0x5125B029: calc_exact_ST_dbld2 (ms_main.c:1268)
Ok, I've fixed this problem.
Turns out the problem was not in Massif. Rather, VG_(calloc)() was not
zeroing the entire block it allocated. (Massif was assuming the block was
zeroed.) The offending check-in was into vg_malloc2.c, 1.35:
@@ -1282,7 +1292,9 @@ void* VG_(arena_calloc) ( ArenaId aid, S
else
p = VG_(arena_malloc_aligned) ( aid, alignB, size );
- for (i = 0; i < size; i++) p[i] = 0;
+ VG_(memset)(p, 0, nbytes);
+
+ VALGRIND_MALLOCLIKE_BLOCK(p, nbytes, 0, True);
VGP_POPCC(VgpMalloc);
As you can see, "nbytes" and "size" have been confused, and so my commit
actually fixes two bugs, since both the added lines were wrong.
In my fix, I renamed "nbytes" (here and elsewhere) as "bytes_per_memb" to
reduce the likelihood of such screw-ups in the future.
Jeremy, roll rc2 whenever you like.
N
|
|
From: Jeremy F. <je...@go...> - 2005-03-11 01:55:44
|
Nicholas Nethercote wrote:
> As you can see, "nbytes" and "size" have been confused, and so my
> commit actually fixes two bugs, since both the added lines were wrong.
Oops. I'm surprised so little broke as a result.
> In my fix, I renamed "nbytes" (here and elsewhere) as "bytes_per_memb"
> to reduce the likelihood of such screw-ups in the future.
>
> Jeremy, roll rc2 whenever you like.
Will do shortly.
J
|
|
From: James B. <ja...@ha...> - 2005-03-11 09:32:10
|
On Thu, 2005-03-10 at 19:34 -0600, Nicholas Nethercote wrote: > On Thu, 10 Mar 2005, Jeremy Fitzhardinge wrote: > > >> Some printfs show that its xpt_snapshot->xpt which has the bad value; > >> running massif under memcheck shows that it is uninitialized: > >> > >> ==29793== Conditional jump or move depends on uninitialised value(s) > >> ==29793== at 0x5125B038: calc_exact_ST_dbld2 (ms_main.c:1266) > >> ==29793== > >> ==29793== Use of uninitialised value of size 4 > >> ==29793== at 0x5125B029: calc_exact_ST_dbld2 (ms_main.c:1268) > > Ok, I've fixed this problem. > > Turns out the problem was not in Massif. Rather, VG_(calloc)() was not > zeroing the entire block it allocated. (Massif was assuming the block was > zeroed.) The offending check-in was into vg_malloc2.c, 1.35: Thanks for fixing this - and thanks to Jeremy for finding an example to save me trying to create a simple test case. Massif from valgrind 2.4.0.rc2 works fine for me. Being able to run valgrind tools on other valgrind tools is fantastic. Cheers, James. James Begley -- Telephone: +354-575-2039. Marine Research Institute, Skulagata 4, P.O. Box 1390, 121 Reykjavik, Iceland. |
|
From: Jeremy F. <je...@go...> - 2005-03-09 17:09:35
|
James Begley wrote:
>== 200 tests, 2 stderr failures, 0 stdout failures =================
>memcheck/tests/scalar (stderr)
>memcheck/tests/scalar_supp (stderr)
>
>
These are expected. FC3's libc/gcc doesn't generate quite as long a
stack trace as is expected by the test.
>However, when I use the new version of massif, I get the following:
>
>====================================================================
>==19751==
>==19751== Stack overflow in thread 0: can't grow stack to 0xE4E4E4F4
>--19751-- INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) -
>exiting
>--19751-- si_code=1 Fault EIP: 0xB7D74051; Faulting address: 0xE4E4E4F4
>--19751-- esp=0xB0755F38
>
>
>valgrind: the `impossible' happened:
> Killed by fatal signal
>Basic block ctr is approximately 503255235
>==19751== at 0xB7D74051: calc_exact_ST_dbld2 (ms_main.c:1269)
>
>
It looks like the crash happened when massif is assembling its final
report.
It seems that one of the pointers in the line
xpt_snapshot->xpt->exact_ST_dbld += d_t1_t2 * xpt_snapshot->space
is equal to 0xe4e4e4e4 - a very clearly bogus pointer.
(The "stack overflow" message is meaningless in this context; it failed
to grow the stack to that address, so it decided that it was an overflow.)
Nick?
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-03-09 17:46:59
|
On Wed, 9 Mar 2005, Jeremy Fitzhardinge wrote: > It looks like the crash happened when massif is assembling its final report. > It seems that one of the pointers in the line > > xpt_snapshot->xpt->exact_ST_dbld += d_t1_t2 * xpt_snapshot->space > > is equal to 0xe4e4e4e4 - a very clearly bogus pointer. > > (The "stack overflow" message is meaningless in this context; it failed to > grow the stack to that address, so it decided that it was an overflow.) > > Nick? I'll look at it tonight. N |
|
From: Nicholas N. <nj...@cs...> - 2005-03-10 02:47:11
|
On Wed, 9 Mar 2005, Jeremy Fitzhardinge wrote: >> However, when I use the new version of massif, I get the following: >> >> ==================================================================== >> ==19751== >> ==19751== Stack overflow in thread 0: can't grow stack to 0xE4E4E4F4 >> --19751-- INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - >> exiting >> --19751-- si_code=1 Fault EIP: 0xB7D74051; Faulting address: 0xE4E4E4F4 >> --19751-- esp=0xB0755F38 >> >> >> valgrind: the `impossible' happened: >> Killed by fatal signal >> Basic block ctr is approximately 503255235 >> ==19751== at 0xB7D74051: calc_exact_ST_dbld2 (ms_main.c:1269) >> > It looks like the crash happened when massif is assembling its final report. > It seems that one of the pointers in the line > > xpt_snapshot->xpt->exact_ST_dbld += d_t1_t2 * xpt_snapshot->space > > is equal to 0xe4e4e4e4 - a very clearly bogus pointer. > > (The "stack overflow" message is meaningless in this context; it failed to > grow the stack to that address, so it decided that it was an overflow.) I can't reproduce this problem. Can you provide a test program that demonstrates the behaviour? N |
|
From: Jeremy F. <je...@go...> - 2005-03-10 08:52:45
|
Nicholas Nethercote wrote: >> It looks like the crash happened when massif is assembling its final >> report. >> It seems that one of the pointers in the line >> >> xpt_snapshot->xpt->exact_ST_dbld += d_t1_t2 * xpt_snapshot->space >> >> is equal to 0xe4e4e4e4 - a very clearly bogus pointer. >> >> (The "stack overflow" message is meaningless in this context; it >> failed to grow the stack to that address, so it decided that it was >> an overflow.) > > > I can't reproduce this problem. Can you provide a test program that > demonstrates the behaviour? I get it with memcheck/tests/leakotron: : abulafia:pts/2; coregrind/valgrind '--tool=massif' memcheck/tests/leakotron ==29291== Massif, a space profiler for x86-linux. ==29291== Copyright (C) 2003, Nicholas Nethercote ==29291== Using valgrind-2.4.0.rc1, a program supervision framework for x86-linux. ==29291== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al. ==29291== For more details, rerun with: -v ==29291== FAILED: I count 399408 bytes, leakcheck says 0 ==29291== ==29291== Stack overflow in thread 0: can't grow stack to 0x919191A1 --29291-- INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --29291-- si_code=1 Fault EIP: 0xB7D46029; Faulting address: 0x919191A1 --29291-- esp=0xB050AF38 valgrind: the `impossible' happened: Killed by fatal signal Basic block ctr is approximately 16779353 ==29291== at 0xB7D46029: calc_exact_ST_dbld2 (ms_main.c:1268) sched status: running_tid=0 Note: see also the FAQ.txt in the source distribution. It contains workarounds to several common problems. If that doesn't help, please report this bug to: valgrind.kde.org In the bug report, send all the above text, the valgrind version, and what Linux distro you are using. Thanks. Some printfs show that its xpt_snapshot->xpt which has the bad value; running massif under memcheck shows that it is uninitialized: ==29793== Conditional jump or move depends on uninitialised value(s) ==29793== at 0x5125B038: calc_exact_ST_dbld2 (ms_main.c:1266) ==29793== ==29793== Use of uninitialised value of size 4 ==29793== at 0x5125B029: calc_exact_ST_dbld2 (ms_main.c:1268) J |
|
From: Jeremy F. <je...@go...> - 2005-03-10 08:58:41
|
Jeremy Fitzhardinge wrote: >Nicholas Nethercote wrote: > > > >>>It looks like the crash happened when massif is assembling its final >>>report. >>>It seems that one of the pointers in the line >>> >>>xpt_snapshot->xpt->exact_ST_dbld += d_t1_t2 * xpt_snapshot->space >>> >>>is equal to 0xe4e4e4e4 - a very clearly bogus pointer. >>> >>>(The "stack overflow" message is meaningless in this context; it >>>failed to grow the stack to that address, so it decided that it was >>>an overflow.) >>> >>> >>I can't reproduce this problem. Can you provide a test program that >>demonstrates the behaviour? >> >> > >I get it with memcheck/tests/leakotron: > >: abulafia:pts/2; coregrind/valgrind '--tool=massif' memcheck/tests/leakotron >==29291== Massif, a space profiler for x86-linux. >==29291== Copyright (C) 2003, Nicholas Nethercote >==29291== Using valgrind-2.4.0.rc1, a program supervision framework for x86-linux. >==29291== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al. >==29291== For more details, rerun with: -v >==29291== >FAILED: I count 399408 bytes, leakcheck says 0 >==29291== >==29291== Stack overflow in thread 0: can't grow stack to 0x919191A1 >--29291-- INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting >--29291-- si_code=1 Fault EIP: 0xB7D46029; Faulting address: 0x919191A1 >--29291-- esp=0xB050AF38 > > >valgrind: the `impossible' happened: > Killed by fatal signal >Basic block ctr is approximately 16779353 >==29291== at 0xB7D46029: calc_exact_ST_dbld2 (ms_main.c:1268) > >sched status: > running_tid=0 > > >Note: see also the FAQ.txt in the source distribution. >It contains workarounds to several common problems. > >If that doesn't help, please report this bug to: valgrind.kde.org > >In the bug report, send all the above text, the valgrind >version, and what Linux distro you are using. Thanks. > > >Some printfs show that its xpt_snapshot->xpt which has the bad value; >running massif under memcheck shows that it is uninitialized: > >==29793== Conditional jump or move depends on uninitialised value(s) >==29793== at 0x5125B038: calc_exact_ST_dbld2 (ms_main.c:1266) >==29793== >==29793== Use of uninitialised value of size 4 >==29793== at 0x5125B029: calc_exact_ST_dbld2 (ms_main.c:1268) > > > With some more backtrace: ==32004== Stack overflow in thread 0: can't grow stack to 0x51515161 --32004-- INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --32004-- si_code=1 Fault EIP: 0xB7DBF042; Faulting address: 0x51515161 --32004-- esp=0xB050AF38 valgrind: the `impossible' happened: Killed by fatal signal Basic block ctr is approximately 16779353 ==32004== at 0xB7DBF042: calc_exact_ST_dbld2 (ms_main.c:1268) ==32004== by 0xB7DBF17B: calc_exact_ST_dbld (ms_main.c:1305) ==32004== by 0xB7DC0159: vgSkin_fini (ms_main.c:1805) ==32004== by 0xB006DACE: vgSkinInternal_fini (vg_toolint.c:36) ==32004== by 0xB0030C60: vgPlain_shutdown_actions (vg_main.c:2683) ==32004== by 0xB0087CB4: vgArch_thread_wrapper (core_os.c:48) sched status: running_tid=0 J |