Thread: [Lapackpp-devel] Interesting crash in QR solve
Status: Beta
Brought to you by:
cstim
From: Michael P. <mr...@cm...> - 2006-07-13 20:00:22
|
Dear Christian and others, I am thankful for this library to avoid calling LAPACK functions directly, but I'm having trouble from memory corruption in two situations: printing out matrices, and linear systems solution. (I don't remember seeing the problem when using LU, only QR and SVD.) First, sometimes the assertion assert(cI.end() >= 0); (mtmpl.h, line 279) fails. The main issue is a strange crash, which I was able to track down in valgrind (since the stack in GDB is messed up): ==16581== Invalid write of size 4 ==16581== at 0x40635D7: dgels_ (in /usr/local/lib/liblapackpp.so.1.13.0) ==16581== by 0x4031EA1: LaQRLinearSolveIP(LaGenMatDouble&, LaGenMatDouble&, L aGenMatDouble const&) (linslv.cc:220) ==16581== by 0x4032141: LaQRLinearSolve(LaGenMatDouble const&, LaGenMatDouble &, LaGenMatDouble const&) (linslv.cc:160) ==16581== by 0x4032BD7: LaLinearSolve(LaGenMatDouble const&, LaGenMatDouble&, LaGenMatDouble const&) (linslv.cc:56) ==16581== by 0x805385D: Patch::solve() (in /home/mrprice/student_projects/IAV /segmenter/segmenter) ==16581== by 0x8054388: SurfaceFit::solve() (in /home/mrprice/student_project s/IAV/segmenter/segmenter) ==16581== by 0x804AE39: Segmenter::surfacefit() (segmenter.cpp:935) ==16581== by 0x804A480: main (main.cpp:133) ==16581== Address 0x6DC5C10 is 0 bytes after a block of size 16 alloc'd ==16581== at 0x40057E9: operator new[](unsigned) (vg_replace_malloc.c:195) ==16581== by 0x4052287: VectorDouble::VectorDouble(unsigned) (vd.h:46) ==16581== by 0x403701F: LaGenMatDouble::LaGenMatDouble(int, int) (gmd.cc:62) ==16581== by 0x4042954: LaGenMatDouble::resize(int, int) (mtmpl.h:189) ==16581== by 0x4042C2D: LaGenMatDouble::resize(LaGenMatDouble const&) (gmtmpl .cc:113) ==16581== by 0x4044A1B: LaGenMatDouble::copy(LaGenMatDouble const&) (mtmpl.h: 210) ==16581== by 0x4031E0D: LaQRLinearSolveIP(LaGenMatDouble&, LaGenMatDouble&, L aGenMatDouble const&) (gmd.h:665) ==16581== by 0x4032141: LaQRLinearSolve(LaGenMatDouble const&, LaGenMatDouble &, LaGenMatDouble const&) (linslv.cc:160) ==16581== by 0x4032BD7: LaLinearSolve(LaGenMatDouble const&, LaGenMatDouble&, LaGenMatDouble const&) (linslv.cc:56) ==16581== by 0x805385D: Patch::solve() (in /home/mrprice/student_projects/IAV /segmenter/segmenter) ==16581== by 0x8054388: SurfaceFit::solve() (in /home/mrprice/student_project s/IAV/segmenter/segmenter) ==16581== by 0x804AE39: Segmenter::surfacefit() (segmenter.cpp:935) ==16581== ==16581== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- y starting debugger ==16581== starting debugger with cmd: /usr/bin/gdb -nw /proc/16618/fd/1014 16618 GNU gdb Red Hat Linux (6.3.0.0-1.122rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db lib rary "/lib/libthread_db.so.1". Attaching to program: /proc/16618/fd/1014, process 16618 Reading symbols from shared object read from target memory...done. Loaded system supplied DSO at 0x37fff000 `shared object read from target memory' has disappeared; keeping its symbols. 0x040635d7 in ?? () (gdb) bt #0 0x040635d7 in ?? () #1 0xbef53da4 in ?? () #2 0xbef53d6c in ?? () #3 0x00000001 in ?? () #4 0x00929300 in ?? () #5 0x00000000 in ?? () Here's the code in question, it's just a variable-order polynomial least squares fit function. I have checked (using valgrind) and found no memory leaks or other issues up to this point. void Patch::solve() { int i = 0; int j = 0; surfcoord x = 0, y = 0, z = 0; int eq_size = PATCH_EQ_LENGTH; switch (MAX_ORDER) { case 1: eq_size = 3; break; case 2: eq_size = 6; break; case 3: eq_size = 10; break; case 4: eq_size = 15; break; } // Every patch makes a 15x15 square in the matrix AtA LaGenMatDouble A(num_points, eq_size); LaVectorDouble b(num_points); LaVectorDouble eq(eq_size); for (j = 0; j < num_points; j++) { x = points[j * 3]; y = points[j * 3 + 1]; z = points[j * 3 + 2]; A(j, 0) = 1; A(j, 1) = x; A(j, 2) = y; if (MAX_ORDER >= 2) { A(j, 3) = x * x; A(j, 4) = x * y; A(j, 5) = y * y; } if (MAX_ORDER >= 3) { A(j, 6) = A(j, 3) * x; A(j, 7) = A(j, 3) * y; A(j, 8) = A(j, 5) * x; A(j, 9) = A(j, 5) * y; } if (MAX_ORDER >= 4) { A(j, 10) = A(j, 6) * x; A(j, 11) = A(j, 6) * y; A(j, 12) = A(j, 3) * A(j, 5); A(j, 13) = A(j, 9) * x; A(j, 14) = A(j, 9) * y; } b(j) = z; } LaLinearSolve(A, eq, b); for (j = 0; j < eq_size; j++) { equation[j] = eq2(j); } } I don't know the workings of Lapack++ too well, but it sounds like the workspace of the Fortran function may not have been allocated correctly in LaQRLinearSolveIP(). Have you seen this problem before, or know how to fix it? Thanks Michael |
From: Christian S. <sti...@tu...> - 2006-07-14 08:48:20
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear Michael, thank you for your nice feedback. I'd happily try to fix these bugs, although I probably need some assistance to track them down. I'll reply to both bugs in individual posts; first this one: Michael Price schrieb: > situations: printing out matrices, and linear systems solution. (I > don't remember seeing the problem when using LU, only QR and SVD.) > First, sometimes the assertion > > assert(cI.end() >= 0); (mtmpl.h, line 279) > > fails. This assertion should only be reached when you used negative increments for a submatrix index; do you really use negative increments? I would certainly agree that several functions have never been tested with negative increments, only with positive ones, so there can very well be a bug in there. The assertion is perfectly valid and if it fails then it means that probably somewhere else some code is wrong if the increments are negative (instead of positive). Could you submit a code snippet that crashes with that assertion? Thanks. Christian -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBRLdaCWXAi+BfhivFAQIEswQAkESe67QTy0wCivjUL2bbvCxgFN/lyLtu KCTU3I7A9hxJ/UVlOprWmKcNBexcxL5/VW7T+i54O7Ud1YEqHZIzZfokIgE5rnMs YWgPIuE9Qb4JseMzSYXAxZ4oS/WGwWB3RKAswcdgoKg3WQItrIEOd9ylj9eaqtrr C/Iyc26xiW4= =toTa -----END PGP SIGNATURE----- |
From: Do bi <mrc...@ya...> - 2006-07-28 21:43:23
|
Hi friends, I succeeededb in compliling the lapack++ proj file in vc++. Since there are no hands-on tutorials on lapach++, please just state briefly how the simplex code can be written: Say I need a small matrix and a small vector and I want to multiply, display, and solve LU. Thanks Christian Stimming <sti...@tu...> wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear Michael, thank you for your nice feedback. I'd happily try to fix these bugs, although I probably need some assistance to track them down. I'll reply to both bugs in individual posts; first this one: Michael Price schrieb: > situations: printing out matrices, and linear systems solution. (I > don't remember seeing the problem when using LU, only QR and SVD.) > First, sometimes the assertion > > assert(cI.end() >= 0); (mtmpl.h, line 279) > > fails. This assertion should only be reached when you used negative increments for a submatrix index; do you really use negative increments? I would certainly agree that several functions have never been tested with negative increments, only with positive ones, so there can very well be a bug in there. The assertion is perfectly valid and if it fails then it means that probably somewhere else some code is wrong if the increments are negative (instead of positive). Could you submit a code snippet that crashes with that assertion? Thanks. Christian -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBRLdaCWXAi+BfhivFAQIEswQAkESe67QTy0wCivjUL2bbvCxgFN/lyLtu KCTU3I7A9hxJ/UVlOprWmKcNBexcxL5/VW7T+i54O7Ud1YEqHZIzZfokIgE5rnMs YWgPIuE9Qb4JseMzSYXAxZ4oS/WGwWB3RKAswcdgoKg3WQItrIEOd9ylj9eaqtrr C/Iyc26xiW4= =toTa -----END PGP SIGNATURE----- ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ lapackpp-devel mailing list lap...@li... https://lists.sourceforge.net/lists/listinfo/lapackpp-devel __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Christian S. <sti...@tu...> - 2006-07-14 09:19:26
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Michael, Michael Price schrieb: > The main issue is a strange crash, which I was able to track > down in valgrind (since the stack in GDB is messed up): > > ==16581== Invalid write of size 4 > ==16581== at 0x40635D7: dgels_ (in /usr/local/lib/liblapackpp.so.1.13.0) > ==16581== by 0x4031EA1: LaQRLinearSolveIP(LaGenMatDouble&, > LaGenMatDouble&, L aGenMatDouble const&) (linslv.cc:220) (...) > ==16581== Address 0x6DC5C10 is 0 bytes after a block of size 16 alloc'd > ==16581== at 0x40057E9: operator new[](unsigned) > (vg_replace_malloc.c:195) > ==16581== by 0x4052287: VectorDouble::VectorDouble(unsigned) (vd.h:46) > ==16581== by 0x403701F: LaGenMatDouble::LaGenMatDouble(int, int) > (gmd.cc:62) > ==16581== by 0x4042954: LaGenMatDouble::resize(int, int) (mtmpl.h:189) > ==16581== by 0x4042C2D: LaGenMatDouble::resize(LaGenMatDouble const&) > (gmtmpl .cc:113) > ==16581== by 0x4044A1B: LaGenMatDouble::copy(LaGenMatDouble const&) > (mtmpl.h: 210) > ==16581== by 0x4031E0D: LaQRLinearSolveIP(LaGenMatDouble&, > LaGenMatDouble&, L aGenMatDouble const&) (gmd.h:665) (...) > > I don't know the workings of Lapack++ too well, but it sounds like the > workspace of the Fortran function may not have been allocated correctly > in LaQRLinearSolveIP(). Have you seen this problem before, or know how > to fix it? Yes, the workspace dimension is obviously not allocated correctly, as clearly shown by the valgrind output. Also, sometimes I really don't understand where these workspace dimensions came from in the first place - -- I took this code from the original lapack++-1.x and I *hoped* the calculations were correct. In this particular case, one quick look into "man dgels" (the manual page of LAPACK's dgels) shows a completely different minimum workspace dimension compared to the one existing in the source code. Can you apply the patch below (i.e. replace the calculation of "W" by the lower line) and see whether this fixes your crash? However, when checking this code more closely I discovered that the whole LaQRLinearSolveIP() is probably broken for many N!=M matrices, i.e. for non-square A matrix. In particular, the change in line 217, committed in 2004, http://lapackpp.cvs.sourceforge.net/lapackpp/lapackpp/src/linslv.cc?view=diff&r1=1.5&r2=1.6 is probably fundamentally wrong -- the matrix Xtmp that is to be passed to dgels() should have dimension (max(M,N) x Nrhs) (as defined in line 213), but the command Xtmp=B (line 217) which is just an alias for Xtmp.copy(B) will actually resize Xtmp into dimensions identical to the B matrix, which in turn is (M, Nrhs) which is clearly wrong as soon as M < N. Does that apply to you, or were you lucky enough to only use a M>N? Regards, Christian Index: src/linslv.cc =================================================================== RCS file: /cvsroot/lapackpp/lapackpp/src/linslv.cc,v retrieving revision 1.13 diff -r1.13 linslv.cc 201c201 < long int W = std::min(M,N) + nb * MAX3(M, N, nrhs); - --- > long int W = std::max(1, M * N + nb * std::max(M * N, nrhs) ); -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBRLdhg2XAi+BfhivFAQIADAP/R7Jfr4i/IdR01mc7If8LnOxkbzh3I1/B qRar4HtSTOKxzGGBgke3yOFa/LoUe6drrvZlJiWJkXgBNOYAu+ZoadpENQCB7xC6 WznExmH7OCyoVaR89jUPfL9Hv1tAebCzrsowrbtd8jxDKFLjaDv6UNRhyI1H0+mB +wYWtKTeqxU= =rYwx -----END PGP SIGNATURE----- |
From: Michael P. <mr...@cm...> - 2006-07-17 17:57:08
|
Christian, Thanks for responding so quickly and thoughtfully. As for negative submatrix increments, I'm not using them, but I also may have reported the wrong failed assertion; it may have been "assert(cI.end() < mat.size(0))" on line 273. For now I've only been doing overdetermined systems (for example: 50 to 200 equations, 3 unknowns). I think the change of linslv.cc:220 makes sense to correctly initialize Xtmp for either m >= n or m < n. I wonder why that was changed to Xtmp = B? Anyway, changing the work vector size for the QR solve routine to agree with the manual (as you suggested) eliminated the crash I was experiencing. I will take a closer look at it later, since we will also be using the sparse LU solver and eigenvalue functions (which I understand we may need to debug somewhat). I'll keep you posted. Thanks Michael |
From: Christian S. <sti...@tu...> - 2006-07-18 20:31:17
|
Hi Michael, Am Montag, 17. Juli 2006 19:55 schrieb Michael Price: > Thanks for responding so quickly and thoughtfully. As for negative > submatrix increments, I'm not using them, but I also may have reported > the wrong failed assertion; it may have been "assert(cI.end() < > mat.size(0))" on line 273. yes, might have happened. > For now I've only been doing overdetermined systems (for example: 50 to > 200 equations, 3 unknowns). I think the change of linslv.cc:220 makes > sense to correctly initialize Xtmp for either m >= n or m < n. I wonder > why that was changed to Xtmp = B? That change was probably "not-so-well-thought-out". You can see in the CVS logs that it wasn't me. Obviously at the time I should have had a closer look and/or have insisted on a test case. Well, you never stop learning. > Anyway, changing the work vector size for the QR solve routine to agree > with the manual (as you suggested) eliminated the crash I was > experiencing. I will take a closer look at it later, since we will also > be using the sparse LU solver and eigenvalue functions (which I > understand we may need to debug somewhat). I'll keep you posted. Are you familiar with using CVS? If yes, then I'd be happy to add you as a developer to the lapackpp project on sourceforge. That way, you have the faster CVS access as available for developers, and potentially you could contribute your own patches as well. Surely I'll have a close look at any submitted code, but CVS ensures nothing will get lost. If you like to use developer CVS, you would have to get an account at sourceforge.net and tell me the account name. Regards, Christian |
From: Christian S. <sti...@tu...> - 2006-07-14 13:40:18
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Michael, I've had a very close look into the QR solving routines. Unfortunately I discovered they were broken all along and all the time :-( Thanks for clearly pointing this out by the valgrind and gdb backtraces. I've fixed this now in CVS. Is it possible for you to get the current source code from CVS, https://sourceforge.net/cvs/?group_id=99696 ? If this is possible, then I'd be happy to hear whether this works again now. If it works, then I'll make a lapackpp-2.4.11 release ASAP. Thank you very much. Christian Michael Price schrieb: > fails. The main issue is a strange crash, which I was able to track > down in valgrind (since the stack in GDB is messed up): -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBRLeeo2XAi+BfhivFAQJRzgP8DgNOxzu5baGb2ei6/3iRE7U5kk3CCZBV ZziOoOZEvbj3SD9swkH5JuQCDA35BFWxArTdXj0+oKHPuw2Hp49aOpIffAFrZB0I XTiYXImUcKVXm2V8r4nPFnwsITjso0o7BxU6xu5PY8rmGhoiMzBV99UyMoC/EgOR utmwmG4hMT4= =u1rD -----END PGP SIGNATURE----- |
From: Do bi <mrc...@ya...> - 2006-07-27 15:02:08
|
Please tell me how to write a smle matrix solve programme in c++ using lapack++, what header files aare needed and why. Christian Stimming <sti...@tu...> wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Michael, I've had a very close look into the QR solving routines. Unfortunately I discovered they were broken all along and all the time :-( Thanks for clearly pointing this out by the valgrind and gdb backtraces. I've fixed this now in CVS. Is it possible for you to get the current source code from CVS, https://sourceforge.net/cvs/?group_id=99696 ? If this is possible, then I'd be happy to hear whether this works again now. If it works, then I'll make a lapackpp-2.4.11 release ASAP. Thank you very much. Christian Michael Price schrieb: > fails. The main issue is a strange crash, which I was able to track > down in valgrind (since the stack in GDB is messed up): -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBRLeeo2XAi+BfhivFAQJRzgP8DgNOxzu5baGb2ei6/3iRE7U5kk3CCZBV ZziOoOZEvbj3SD9swkH5JuQCDA35BFWxArTdXj0+oKHPuw2Hp49aOpIffAFrZB0I XTiYXImUcKVXm2V8r4nPFnwsITjso0o7BxU6xu5PY8rmGhoiMzBV99UyMoC/EgOR utmwmG4hMT4= =u1rD -----END PGP SIGNATURE----- ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ lapackpp-devel mailing list lap...@li... https://lists.sourceforge.net/lists/listinfo/lapackpp-devel __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Christian S. <sti...@tu...> - 2006-07-28 08:25:11
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear Donald, 1. If you have a question, feel free to ask it here on the mailing list, but then please WAIT until we're at our computers again and respond. There is no need to write a question four times. 2. Every documentation that exists is mentioned on http://lapackpp.sourceforge.net/ , especially the last sentence of the section "Documentation": There is some old, outdated information about the original LAPACK++-1.1 in the LAPACK++ User's Manual and Class Reference Manual, all available from http://www.netlib.org/ or on http://math.nist.gov/lapack++/ . So please read the "LAPACK++ User's Manual" now. 3. As for your MSVC compiler: Using your compiler is better explained by the documentation of the compiler. My primary development environment is a different operating system and compiler anyway, so I'm the wrong person to explain you anything about the MSVC compiler. Christian Do bi schrieb: > Please tell me how to write a smle matrix solve programme in c++ using > lapack++, what header files aare needed and why. > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBRMnJy2XAi+BfhivFAQKDSwP8Df0c3hRfgBJljGPaW/dTgiRTwTX1aSlz A6NOpCTldWDeH0dDbh6mE29UgJfN3AHWdAE8AoAZ89gZqK0b0yUGA374TzJWxDlx aTVmBMopA91ydbgYvk0bYOMMHnGKPZpMYgKIcNXd7WTB+PJECl1vYOrAJATLVmmo MuylpGRCn0I= =w5lp -----END PGP SIGNATURE----- |