Re: [Lapackpp-devel] Interesting crash in QR solve

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Michael,

Michael Price schrieb:
> The main issue is a strange crash, which I was able to track 
> down in valgrind (since the stack in GDB is messed up):
> 
> ==16581== Invalid write of size 4
> ==16581==    at 0x40635D7: dgels_ (in /usr/local/lib/liblapackpp.so.1.13.0)
> ==16581==    by 0x4031EA1: LaQRLinearSolveIP(LaGenMatDouble&, 
> LaGenMatDouble&, L aGenMatDouble const&) (linslv.cc:220)
(...)
> ==16581==  Address 0x6DC5C10 is 0 bytes after a block of size 16 alloc'd
> ==16581==    at 0x40057E9: operator new[](unsigned) 
> (vg_replace_malloc.c:195)
> ==16581==    by 0x4052287: VectorDouble::VectorDouble(unsigned) (vd.h:46)
> ==16581==    by 0x403701F: LaGenMatDouble::LaGenMatDouble(int, int) 
> (gmd.cc:62)
> ==16581==    by 0x4042954: LaGenMatDouble::resize(int, int) (mtmpl.h:189)
> ==16581==    by 0x4042C2D: LaGenMatDouble::resize(LaGenMatDouble const&) 
> (gmtmpl .cc:113)
> ==16581==    by 0x4044A1B: LaGenMatDouble::copy(LaGenMatDouble const&) 
> (mtmpl.h: 210)
> ==16581==    by 0x4031E0D: LaQRLinearSolveIP(LaGenMatDouble&, 
> LaGenMatDouble&, L aGenMatDouble const&) (gmd.h:665)
(...)
> 
> I don't know the workings of Lapack++ too well, but it sounds like the 
> workspace of the Fortran function may not have been allocated correctly 
> in LaQRLinearSolveIP().  Have you seen this problem before, or know how 
> to fix it?

Yes, the workspace dimension is obviously not allocated correctly, as
clearly shown by the valgrind output. Also, sometimes I really don't
understand where these workspace dimensions came from in the first place
- -- I took this code from the original lapack++-1.x and I *hoped* the
calculations were correct. In this particular case, one quick look into
"man dgels" (the manual page of LAPACK's dgels) shows a completely
different minimum workspace dimension compared to the one existing in
the source code. Can you apply the patch below (i.e. replace the
calculation of "W" by the lower line) and see whether this fixes your crash?

However, when checking this code more closely I discovered that the
whole LaQRLinearSolveIP() is probably broken for many N!=M matrices,
i.e. for non-square A matrix. In particular, the change in line 217,
committed in 2004,
http://lapackpp.cvs.sourceforge.net/lapackpp/lapackpp/src/linslv.cc?view=diff&r1=1.5&r2=1.6
is probably fundamentally wrong -- the matrix Xtmp that is to be passed
to dgels() should have dimension (max(M,N) x Nrhs) (as defined in line
213), but the command Xtmp=B (line 217) which is just an alias for
Xtmp.copy(B) will actually resize Xtmp into dimensions identical to the
B matrix, which in turn is (M, Nrhs) which is clearly wrong as soon as M
< N. Does that apply to you, or were you lucky enough to only use a M>N?

Regards,

Christian

Index: src/linslv.cc
===================================================================
RCS file: /cvsroot/lapackpp/lapackpp/src/linslv.cc,v
retrieving revision 1.13
diff -r1.13 linslv.cc
201c201
<     long int W = std::min(M,N) + nb * MAX3(M, N, nrhs);
- ---
>     long int W = std::max(1, M * N + nb * std::max(M * N, nrhs) );
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBRLdhg2XAi+BfhivFAQIADAP/R7Jfr4i/IdR01mc7If8LnOxkbzh3I1/B
qRar4HtSTOKxzGGBgke3yOFa/LoUe6drrvZlJiWJkXgBNOYAu+ZoadpENQCB7xC6
WznExmH7OCyoVaR89jUPfL9Hv1tAebCzrsowrbtd8jxDKFLjaDv6UNRhyI1H0+mB
+wYWtKTeqxU=
=rYwx
-----END PGP SIGNATURE-----