[Lapackpp-devel] Interesting crash in QR solve

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Dear Christian and others,

I am thankful for this library to avoid calling LAPACK functions 
directly, but I'm having trouble from memory corruption in two 
situations: printing out matrices, and linear systems solution.  (I 
don't remember seeing the problem when using LU, only QR and SVD.)  
First, sometimes the assertion

    assert(cI.end() >= 0); (mtmpl.h, line 279)

fails.   The main issue is a strange crash, which I was able to track 
down in valgrind (since the stack in GDB is messed up):

==16581== Invalid write of size 4
==16581==    at 0x40635D7: dgels_ (in /usr/local/lib/liblapackpp.so.1.13.0)
==16581==    by 0x4031EA1: LaQRLinearSolveIP(LaGenMatDouble&, 
LaGenMatDouble&, L aGenMatDouble const&) (linslv.cc:220)
==16581==    by 0x4032141: LaQRLinearSolve(LaGenMatDouble const&, 
LaGenMatDouble &, LaGenMatDouble const&) (linslv.cc:160)
==16581==    by 0x4032BD7: LaLinearSolve(LaGenMatDouble const&, 
LaGenMatDouble&,  LaGenMatDouble const&) (linslv.cc:56)
==16581==    by 0x805385D: Patch::solve() (in 
/home/mrprice/student_projects/IAV /segmenter/segmenter)
==16581==    by 0x8054388: SurfaceFit::solve() (in 
/home/mrprice/student_project s/IAV/segmenter/segmenter)
==16581==    by 0x804AE39: Segmenter::surfacefit() (segmenter.cpp:935)
==16581==    by 0x804A480: main (main.cpp:133)
==16581==  Address 0x6DC5C10 is 0 bytes after a block of size 16 alloc'd
==16581==    at 0x40057E9: operator new[](unsigned) 
(vg_replace_malloc.c:195)
==16581==    by 0x4052287: VectorDouble::VectorDouble(unsigned) (vd.h:46)
==16581==    by 0x403701F: LaGenMatDouble::LaGenMatDouble(int, int) 
(gmd.cc:62)
==16581==    by 0x4042954: LaGenMatDouble::resize(int, int) (mtmpl.h:189)
==16581==    by 0x4042C2D: LaGenMatDouble::resize(LaGenMatDouble const&) 
(gmtmpl .cc:113)
==16581==    by 0x4044A1B: LaGenMatDouble::copy(LaGenMatDouble const&) 
(mtmpl.h: 210)
==16581==    by 0x4031E0D: LaQRLinearSolveIP(LaGenMatDouble&, 
LaGenMatDouble&, L aGenMatDouble const&) (gmd.h:665)
==16581==    by 0x4032141: LaQRLinearSolve(LaGenMatDouble const&, 
LaGenMatDouble &, LaGenMatDouble const&) (linslv.cc:160)
==16581==    by 0x4032BD7: LaLinearSolve(LaGenMatDouble const&, 
LaGenMatDouble&,  LaGenMatDouble const&) (linslv.cc:56)
==16581==    by 0x805385D: Patch::solve() (in 
/home/mrprice/student_projects/IAV /segmenter/segmenter)
==16581==    by 0x8054388: SurfaceFit::solve() (in 
/home/mrprice/student_project s/IAV/segmenter/segmenter)
==16581==    by 0x804AE39: Segmenter::surfacefit() (segmenter.cpp:935)
==16581==
==16581== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- y
starting debugger
==16581== starting debugger with cmd: /usr/bin/gdb -nw 
/proc/16618/fd/1014 16618
GNU gdb Red Hat Linux (6.3.0.0-1.122rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host 
libthread_db lib rary "/lib/libthread_db.so.1".

Attaching to program: /proc/16618/fd/1014, process 16618
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0x37fff000
`shared object read from target memory' has disappeared; keeping its 
symbols.
0x040635d7 in ?? ()
(gdb) bt
#0  0x040635d7 in ?? ()
#1  0xbef53da4 in ?? ()
#2  0xbef53d6c in ?? ()
#3  0x00000001 in ?? ()
#4  0x00929300 in ?? ()
#5  0x00000000 in ?? ()

Here's the code in question, it's just a variable-order polynomial least 
squares fit function.  I have checked (using valgrind) and found no 
memory leaks or other issues up to this point.

void Patch::solve()
{
    int i = 0;
    int j = 0;
    surfcoord x = 0, y = 0, z = 0;
    int eq_size = PATCH_EQ_LENGTH;
    switch (MAX_ORDER)
    {
    case 1: eq_size = 3;    break;
    case 2: eq_size = 6;    break;
    case 3: eq_size = 10;    break;
    case 4: eq_size = 15;    break;
    }
    //    Every patch makes a 15x15 square in the matrix AtA
    LaGenMatDouble A(num_points, eq_size);
    LaVectorDouble b(num_points);
    LaVectorDouble eq(eq_size);

    for (j = 0; j < num_points; j++)
    {
        x = points[j * 3];
        y = points[j * 3 + 1];
        z = points[j * 3 + 2];

        A(j, 0) = 1;                   
        A(j, 1) = x;           
        A(j, 2) = y;   

        if (MAX_ORDER >= 2)
        {
            A(j, 3) = x * x;
            A(j, 4) = x * y;
            A(j, 5) = y * y;
        }
        if (MAX_ORDER >= 3)
        {
            A(j, 6) = A(j, 3) * x;
            A(j, 7) = A(j, 3) * y;
            A(j, 8) = A(j, 5) * x;
            A(j, 9) = A(j, 5) * y;
        }
        if (MAX_ORDER >= 4)
        {
            A(j, 10) = A(j, 6) * x;
            A(j, 11) = A(j, 6) * y;
            A(j, 12) = A(j, 3) * A(j, 5);
            A(j, 13) = A(j, 9) * x;
            A(j, 14) = A(j, 9) * y;
        }

        b(j) = z;
    }

    LaLinearSolve(A, eq, b);

    for (j = 0; j < eq_size; j++)
    {
        equation[j] = eq2(j);
    }

}

I don't know the workings of Lapack++ too well, but it sounds like the 
workspace of the Fortran function may not have been allocated correctly 
in LaQRLinearSolveIP().  Have you seen this problem before, or know how 
to fix it?

Thanks
Michael