|
From: John R. <jr...@bi...> - 2013-07-09 20:15:50
|
> What I did do is binary search on: > ----- tsvd3.cpp > // Elliminate spurious valgrind uninitialized errors > #if 1 > for( int iii=38; iii<lwork; ++iii ) work[iii]=123.456; > #endif > ----- > I see no complaints when starting the loop at iii=1,2,4,8,16,32; > then errors at 64,48,40; no complaint at 36; errors at 38; > no complaint at 37. Hmmm... First, whenever you are faced with murky malloc, then you should enlist help. The glibc library provides some debugging aids which are quite inexpensive. They are so cheap that I use them all the time, for all processes. Put this in $HOME/.bash_profile, or feed it directly to your shell, etc.: # http://udrepper.livejournal.com/11429.html export MALLOC_PERTURB_=$(($RANDOM % 255 + 1)) echo 1>&2 MALLOC_PERTURB_=$MALLOC_PERTURB_ " # $HOME/.bash_profile" This will cause all bytes in newly malloc()ed areas to be set to the same random byte. [Or, specify a constant such as export MALLOC_PERTURB_=0xF5 When running under valgrind, then the low-level interception of malloc() and the careful watching by memcheck will supersede MALLOC_PERTURB_.] Continuing after the binary search, I tried: ----- tsvd3.cpp // Elliminate spurious valgrind uninitialized errors #if 1 for( int iii=38; iii<lwork; ++iii ) work[iii]=123.456; for( int iii= 1; iii<= 36; ++iii ) work[iii]=123.456; #endif ----- which leaves only work[37] uninit. Running this under valgrind generates complaints from memcheck; the first is: ----- lwork_q= 108 lwork= 108 ==23901== Conditional jump or move depends on uninitialised value(s) ==23901== at 0x5498486: dnrm2_ (/usr/src/debug/lapack-3.4.2/BLAS/SRC/dnrm2.f:94) ==23901== by 0x4E27E27: dlarfg_ (in /usr/lib64/atlas/liblapack.so.3.0) ==23901== by 0x4DACD89: dgelq2_ (in /usr/lib64/atlas/liblapack.so.3.0) ==23901== by 0x4DAD457: dgelqf_ (in /usr/lib64/atlas/liblapack.so.3.0) ==23901== by 0x4DBA96B: dgesdd_ (in /usr/lib64/atlas/liblapack.so.3.0) ==23901== by 0x4018F7: main (/bigdata/home/jreiser/valgrind-fortran/tsvd3.cpp:62) ==23901== Uninitialised value was created by a heap allocation ==23901== at 0x4A07C84: operator new[](unsigned long) (/builddir/build/BUILD/valgrind-3.8.1/coregrind/m_replacemalloc/vg_replace_malloc.c:363) ==23901== by 0x40180F: main (/bigdata/home/jreiser/valgrind-fortran/tsvd3.cpp:52) ----- Now we know that exactly one 8-byte 'double' uninit at work[37] will trigger the complaints. This aligned 8-byte region is small enough that we can take advantage of debugging hardware in x86 chips. So now I run directly under gdb (without valgrind), put a breakpoint just after the code which leaves work[37] uninit, and plant a hardware 'read' watchpoint on &work[37]: (gdb) b tsvd3.cpp:58 (gdb) run Breakpoint 2, main () at tsvd3.cpp:62 (gdb) p &work[37] $1 = (double *) 0x606178 (gdb) rwatch *(double *)0x606178 Hardware read watchpoint 3: *(double *)0x606178 (gdb) continue Lo and behold, work[37] is fetched and used. That is, there is a real error: Hardware read watchpoint 3: *(double *)0x606178 Value = -1.6882786079646144e+260 0x000000000040154a in scal_generic<int, double, double> (n=0x3, alpha=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:17 17 y[iY] *= alpha; (gdb) x/12i $pc-0x18 0x401532 <scal_generic<int, double, double>(int, double const&, double*, int)+54>: mov -0x8(%rbp),%eax 0x401535 <scal_generic<int, double, double>(int, double const&, double*, int)+57>: cltq 0x401537 <scal_generic<int, double, double>(int, double const&, double*, int)+59>: lea 0x0(,%rax,8),%rcx 0x40153f <scal_generic<int, double, double>(int, double const&, double*, int)+67>: mov -0x28(%rbp),%rax 0x401543 <scal_generic<int, double, double>(int, double const&, double*, int)+71>: add %rcx,%rax 0x401546 <scal_generic<int, double, double>(int, double const&, double*, int)+74>: movsd (%rax),%xmm1 ### the fetch of uninit => 0x40154a <scal_generic<int, double, double>(int, double const&, double*, int)+78>: mov -0x20(%rbp),%rax 0x40154e <scal_generic<int, double, double>(int, double const&, double*, int)+82>: movsd (%rax),%xmm0 0x401552 <scal_generic<int, double, double>(int, double const&, double*, int)+86>: mulsd %xmm1,%xmm0 ### the use of uninit 0x401556 <scal_generic<int, double, double>(int, double const&, double*, int)+90>: movsd %xmm0,(%rdx) 0x40155a <scal_generic<int, double, double>(int, double const&, double*, int)+94>: addl $0x1,-0x4(%rbp) 0x40155e <scal_generic<int, double, double>(int, double const&, double*, int)+98>: mov -0x18(%rbp),%eax (gdb) p $rax $2 = 0x606178 ### yes, it is &work[37] (gdb) x/2xw $rax ### and those bytes are uninit 0x606178: 0xf5f5f5f5 0xf5f5f5f5 ### The pattern for uninit set by MALLOC_PERTURB_ (gdb) bt #0 0x000000000040154a in scal_generic<int, double, double> (n=0x3, alpha=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:17 #1 0x00000000004011f8 in gemv_generic<int, double, double, double, double, double> (order=RowMajor, transA=Trans, conjX=NoTrans, m=0x8, n=0x3, alpha=@0x7ffff7ca6bc8: 1, A=0x7fffffffde48, ldA=0x4, x=0x7fffffffde40, incX=0x4, beta=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:108 #2 0x0000000000400e05 in gemv_generic<int, double, double, double, double, double> (order=ColMajor, transA=Trans, conjX=NoTrans, m=0x3, n=0x8, alpha=@0x7ffff7ca6bc8: 1, A=0x7fffffffde48, ldA=0x4, x=0x7fffffffde40, incX=0x4, beta=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:58 #3 0x0000000000400d4e in gemv<int, double, double, double, double, double> ( order=ColMajor, trans=NoTrans, m=0x3, n=0x8, alpha=@0x7ffff7ca6bc8: 1, A=0x7fffffffde48, ldA=0x4, x=0x7fffffffde40, incX=0x4, beta=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:156 #4 0x0000000000400cc8 in dgemv_ (TRANS=0x7ffff7ca6be8 "No transpose", M=0x7fffffffd938, N=0x7fffffffd93c, ALPHA=0x7ffff7ca6bc8, _A=0x7fffffffde48, LDA=0x7fffffffdf58, X=0x7fffffffde40, INCX=0x7fffffffdf58, BETA=0x7ffff7ca6be0, Y=0x606170, INCY=0x7ffff7ca6bdc) at gemv2.cpp:204 #5 0x00007ffff797f3fb in dlarf_ () from /usr/lib64/atlas/liblapack.so.3 #6 0x00007ffff7906e1f in dgelq2_ () from /usr/lib64/atlas/liblapack.so.3 #7 0x00007ffff7907458 in dgelqf_ () from /usr/lib64/atlas/liblapack.so.3 #8 0x00007ffff791496c in dgesdd_ () from /usr/lib64/atlas/liblapack.so.3 #9 0x00000000004018f8 in main () at tsvd3.cpp:62 So there is the [a] real error. Apologize to memcheck, and fix your bug. |