|
From: Norman G. <no...@te...> - 2013-07-09 22:42:56
|
On 07/09/2013 01:16 PM, John Reiser wrote: >> What I did do is binary search on: >> ----- tsvd3.cpp >> // Elliminate spurious valgrind uninitialized errors >> #if 1 >> for( int iii=38; iii<lwork; ++iii ) work[iii]=123.456; >> #endif >> ----- >> I see no complaints when starting the loop at iii=1,2,4,8,16,32; >> then errors at 64,48,40; no complaint at 36; errors at 38; >> no complaint at 37. Hmmm... > First, whenever you are faced with murky malloc, then you should enlist help. > The glibc library provides some debugging aids which are quite inexpensive. > They are so cheap that I use them all the time, for all processes. > Put this in $HOME/.bash_profile, or feed it directly to your shell, etc.: > # http://udrepper.livejournal.com/11429.html > export MALLOC_PERTURB_=$(($RANDOM % 255 + 1)) > echo 1>&2 MALLOC_PERTURB_=$MALLOC_PERTURB_ " # $HOME/.bash_profile" > This will cause all bytes in newly malloc()ed areas to be set to the > same random byte. [Or, specify a constant such as > export MALLOC_PERTURB_=0xF5 > When running under valgrind, then the low-level interception of malloc() > and the careful watching by memcheck will supersede MALLOC_PERTURB_.] > > Continuing after the binary search, I tried: > ----- tsvd3.cpp > // Elliminate spurious valgrind uninitialized errors > #if 1 > for( int iii=38; iii<lwork; ++iii ) work[iii]=123.456; > for( int iii= 1; iii<= 36; ++iii ) work[iii]=123.456; > #endif > ----- > which leaves only work[37] uninit. Running this under valgrind > generates complaints from memcheck; the first is: > ----- > lwork_q= 108 > lwork= 108 > ==23901== Conditional jump or move depends on uninitialised value(s) > ==23901== at 0x5498486: dnrm2_ (/usr/src/debug/lapack-3.4.2/BLAS/SRC/dnrm2.f:94) > ==23901== by 0x4E27E27: dlarfg_ (in /usr/lib64/atlas/liblapack.so.3.0) > ==23901== by 0x4DACD89: dgelq2_ (in /usr/lib64/atlas/liblapack.so.3.0) > ==23901== by 0x4DAD457: dgelqf_ (in /usr/lib64/atlas/liblapack.so.3.0) > ==23901== by 0x4DBA96B: dgesdd_ (in /usr/lib64/atlas/liblapack.so.3.0) > ==23901== by 0x4018F7: main (/bigdata/home/jreiser/valgrind-fortran/tsvd3.cpp:62) > ==23901== Uninitialised value was created by a heap allocation > ==23901== at 0x4A07C84: operator new[](unsigned long) (/builddir/build/BUILD/valgrind-3.8.1/coregrind/m_replacemalloc/vg_replace_malloc.c:363) > ==23901== by 0x40180F: main (/bigdata/home/jreiser/valgrind-fortran/tsvd3.cpp:52) > ----- > Now we know that exactly one 8-byte 'double' uninit at work[37] will trigger the complaints. > This aligned 8-byte region is small enough that we can take advantage of debugging hardware > in x86 chips. > > So now I run directly under gdb (without valgrind), put a breakpoint just after > the code which leaves work[37] uninit, and plant a hardware 'read' watchpoint on &work[37]: > (gdb) b tsvd3.cpp:58 > (gdb) run > Breakpoint 2, main () at tsvd3.cpp:62 > (gdb) p &work[37] > $1 = (double *) 0x606178 > (gdb) rwatch *(double *)0x606178 > Hardware read watchpoint 3: *(double *)0x606178 > (gdb) continue > > > Lo and behold, work[37] is fetched and used. That is, there is a real error: > Hardware read watchpoint 3: *(double *)0x606178 > > Value = -1.6882786079646144e+260 > 0x000000000040154a in scal_generic<int, double, double> (n=0x3, > alpha=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:17 > 17 y[iY] *= alpha; > (gdb) x/12i $pc-0x18 > 0x401532 <scal_generic<int, double, double>(int, double const&, double*, int)+54>: mov -0x8(%rbp),%eax > 0x401535 <scal_generic<int, double, double>(int, double const&, double*, int)+57>: cltq > 0x401537 <scal_generic<int, double, double>(int, double const&, double*, int)+59>: lea 0x0(,%rax,8),%rcx > 0x40153f <scal_generic<int, double, double>(int, double const&, double*, int)+67>: mov -0x28(%rbp),%rax > 0x401543 <scal_generic<int, double, double>(int, double const&, double*, int)+71>: add %rcx,%rax > 0x401546 <scal_generic<int, double, double>(int, double const&, double*, int)+74>: movsd (%rax),%xmm1 ### the fetch of uninit > => 0x40154a <scal_generic<int, double, double>(int, double const&, double*, int)+78>: mov -0x20(%rbp),%rax > 0x40154e <scal_generic<int, double, double>(int, double const&, double*, int)+82>: movsd (%rax),%xmm0 > 0x401552 <scal_generic<int, double, double>(int, double const&, double*, int)+86>: mulsd %xmm1,%xmm0 ### the use of uninit > 0x401556 <scal_generic<int, double, double>(int, double const&, double*, int)+90>: movsd %xmm0,(%rdx) > 0x40155a <scal_generic<int, double, double>(int, double const&, double*, int)+94>: addl $0x1,-0x4(%rbp) > 0x40155e <scal_generic<int, double, double>(int, double const&, double*, int)+98>: mov -0x18(%rbp),%eax > > (gdb) p $rax > $2 = 0x606178 ### yes, it is &work[37] > (gdb) x/2xw $rax ### and those bytes are uninit > 0x606178: 0xf5f5f5f5 0xf5f5f5f5 ### The pattern for uninit set by MALLOC_PERTURB_ > (gdb) bt > #0 0x000000000040154a in scal_generic<int, double, double> (n=0x3, > alpha=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:17 > #1 0x00000000004011f8 in gemv_generic<int, double, double, double, double, double> (order=RowMajor, transA=Trans, conjX=NoTrans, m=0x8, n=0x3, > alpha=@0x7ffff7ca6bc8: 1, A=0x7fffffffde48, ldA=0x4, x=0x7fffffffde40, > incX=0x4, beta=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:108 > #2 0x0000000000400e05 in gemv_generic<int, double, double, double, double, double> (order=ColMajor, transA=Trans, conjX=NoTrans, m=0x3, n=0x8, > alpha=@0x7ffff7ca6bc8: 1, A=0x7fffffffde48, ldA=0x4, x=0x7fffffffde40, > incX=0x4, beta=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:58 > #3 0x0000000000400d4e in gemv<int, double, double, double, double, double> ( > order=ColMajor, trans=NoTrans, m=0x3, n=0x8, alpha=@0x7ffff7ca6bc8: 1, > A=0x7fffffffde48, ldA=0x4, x=0x7fffffffde40, incX=0x4, > beta=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:156 > #4 0x0000000000400cc8 in dgemv_ (TRANS=0x7ffff7ca6be8 "No transpose", > M=0x7fffffffd938, N=0x7fffffffd93c, ALPHA=0x7ffff7ca6bc8, > _A=0x7fffffffde48, LDA=0x7fffffffdf58, X=0x7fffffffde40, > INCX=0x7fffffffdf58, BETA=0x7ffff7ca6be0, Y=0x606170, INCY=0x7ffff7ca6bdc) > at gemv2.cpp:204 > #5 0x00007ffff797f3fb in dlarf_ () from /usr/lib64/atlas/liblapack.so.3 > #6 0x00007ffff7906e1f in dgelq2_ () from /usr/lib64/atlas/liblapack.so.3 > #7 0x00007ffff7907458 in dgelqf_ () from /usr/lib64/atlas/liblapack.so.3 > #8 0x00007ffff791496c in dgesdd_ () from /usr/lib64/atlas/liblapack.so.3 > #9 0x00000000004018f8 in main () at tsvd3.cpp:62 > > So there is the [a] real error. Apologize to memcheck, and fix your bug. > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > Thank you for the detailed analysis and explanation of techniques! This example also shows that the memcheck error descriptions should be taken with a grain of salt i.e. the error may be occurring elsewhere, and not where the error description is suggesting. Is this the nature of the beast, or can memcheck be more accurate in locating the errors? PS work[0] was also left uninitialized :-) |