Re: [Lapackpp-devel] Interesting crash in QR solve
Status: Beta
Brought to you by:
cstim
From: Christian S. <sti...@tu...> - 2006-07-14 09:19:26
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Michael, Michael Price schrieb: > The main issue is a strange crash, which I was able to track > down in valgrind (since the stack in GDB is messed up): > > ==16581== Invalid write of size 4 > ==16581== at 0x40635D7: dgels_ (in /usr/local/lib/liblapackpp.so.1.13.0) > ==16581== by 0x4031EA1: LaQRLinearSolveIP(LaGenMatDouble&, > LaGenMatDouble&, L aGenMatDouble const&) (linslv.cc:220) (...) > ==16581== Address 0x6DC5C10 is 0 bytes after a block of size 16 alloc'd > ==16581== at 0x40057E9: operator new[](unsigned) > (vg_replace_malloc.c:195) > ==16581== by 0x4052287: VectorDouble::VectorDouble(unsigned) (vd.h:46) > ==16581== by 0x403701F: LaGenMatDouble::LaGenMatDouble(int, int) > (gmd.cc:62) > ==16581== by 0x4042954: LaGenMatDouble::resize(int, int) (mtmpl.h:189) > ==16581== by 0x4042C2D: LaGenMatDouble::resize(LaGenMatDouble const&) > (gmtmpl .cc:113) > ==16581== by 0x4044A1B: LaGenMatDouble::copy(LaGenMatDouble const&) > (mtmpl.h: 210) > ==16581== by 0x4031E0D: LaQRLinearSolveIP(LaGenMatDouble&, > LaGenMatDouble&, L aGenMatDouble const&) (gmd.h:665) (...) > > I don't know the workings of Lapack++ too well, but it sounds like the > workspace of the Fortran function may not have been allocated correctly > in LaQRLinearSolveIP(). Have you seen this problem before, or know how > to fix it? Yes, the workspace dimension is obviously not allocated correctly, as clearly shown by the valgrind output. Also, sometimes I really don't understand where these workspace dimensions came from in the first place - -- I took this code from the original lapack++-1.x and I *hoped* the calculations were correct. In this particular case, one quick look into "man dgels" (the manual page of LAPACK's dgels) shows a completely different minimum workspace dimension compared to the one existing in the source code. Can you apply the patch below (i.e. replace the calculation of "W" by the lower line) and see whether this fixes your crash? However, when checking this code more closely I discovered that the whole LaQRLinearSolveIP() is probably broken for many N!=M matrices, i.e. for non-square A matrix. In particular, the change in line 217, committed in 2004, http://lapackpp.cvs.sourceforge.net/lapackpp/lapackpp/src/linslv.cc?view=diff&r1=1.5&r2=1.6 is probably fundamentally wrong -- the matrix Xtmp that is to be passed to dgels() should have dimension (max(M,N) x Nrhs) (as defined in line 213), but the command Xtmp=B (line 217) which is just an alias for Xtmp.copy(B) will actually resize Xtmp into dimensions identical to the B matrix, which in turn is (M, Nrhs) which is clearly wrong as soon as M < N. Does that apply to you, or were you lucky enough to only use a M>N? Regards, Christian Index: src/linslv.cc =================================================================== RCS file: /cvsroot/lapackpp/lapackpp/src/linslv.cc,v retrieving revision 1.13 diff -r1.13 linslv.cc 201c201 < long int W = std::min(M,N) + nb * MAX3(M, N, nrhs); - --- > long int W = std::max(1, M * N + nb * std::max(M * N, nrhs) ); -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBRLdhg2XAi+BfhivFAQIADAP/R7Jfr4i/IdR01mc7If8LnOxkbzh3I1/B qRar4HtSTOKxzGGBgke3yOFa/LoUe6drrvZlJiWJkXgBNOYAu+ZoadpENQCB7xC6 WznExmH7OCyoVaR89jUPfL9Hv1tAebCzrsowrbtd8jxDKFLjaDv6UNRhyI1H0+mB +wYWtKTeqxU= =rYwx -----END PGP SIGNATURE----- |