From: Paul L. I. <aea...@gm...> - 2005-08-10 01:16:49
|
I would like to know if the patch at the bottom of this email solves the geev problem for the other people who have reported it. It solves mine. I have been having the same issue with geev. Calling it causes cmucl to go into an infinite loop. In slime I can interrupt the program, and then call geev again. The second time no infinite loop occurs, but the results are corrupted (i.e. The resulting matrices are populated with silent-NaNs). I have tracked the source of these problems down to a heisenbug in the subroutines which determine parameters of the machine. Unfortunately, this patch could not possibly affect the real cause of the problem, but if it works for other people besides me, it may help lead to the real problem. The logical effect of this patch is to cause the LAPACK routines to assume the base of the machine is at least binary. Index: dlamch.f =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/matlisp/matlisp/LAPACK/SRC/dlamch.f,v retrieving revision 1.1 diff -u -w -r1.1 dlamch.f --- dlamch.f=0915 Apr 2000 00:24:53 -0000=091.1 +++ dlamch.f=099 Aug 2005 02:47:52 -0000 @@ -231,7 +231,7 @@ * * fl( a + b ) .gt. a. * - B =3D 1 + B =3D 2 C =3D DLAMC3( A, B ) * *+ WHILE( C.EQ.A )LOOP |
From: Paul L. I. <aea...@gm...> - 2005-08-11 01:09:10
|
> > Ah, thanks for digging into this. A little googling finds the LAPACK > FAQ (http://www.netlib.org/lapack/faq.html) which indicates that dlamch > (and its dependent routines dlamc[1-5]) should be compiled without > optimization? Can you compile dlamch.f by hand without optimization and > see if that makes a difference for you? > > I'd really appreciate if you could, since I don't seem to have this > problem on my linux boxes. > I recompiled with an unoptimized dlamch and this solved the problem as well. So now we have found multiple ways to solve the problem, and this should do well for anyone wanting to use geev. But in the following I argue that we still haven't found the real issue. I have to admit I'm afraid that this still treats the effect just as the other modifications do. I have linked the optimized/faulty dlamch.o to a fortran program which calls on dgeev with non-tivial arguments. There was no problem with dlamch then (defeating my hope to isolate the bug from the lisp fortran interaction). Now, since the same object file has an error depending on who calls on its functions,I have to think the problem is in the caller not the object file. Even if the error were in the object file, I can't imagine optimization could cause the error we're seeing. I could understand optimization causing a miscalculation of some machine parameters. But we are seeing the logical (C .EQ A) evaluating to false (or being completely skipped), when it is in fact true. If you tell me some optimization would do this, I could believe you, but I don't think I could ever trust optimization again. Paul |
From: Raymond T. <rt...@ea...> - 2005-08-17 01:28:13
|
Paul Ledbetter III wrote: >>Ah, thanks for digging into this. A little googling finds the LAPACK >>FAQ (http://www.netlib.org/lapack/faq.html) which indicates that dlamch >>(and its dependent routines dlamc[1-5]) should be compiled without >>optimization? Can you compile dlamch.f by hand without optimization and >>see if that makes a difference for you? >> >>I'd really appreciate if you could, since I don't seem to have this >>problem on my linux boxes. >> > > > I recompiled with an unoptimized dlamch and this solved the problem as > well. So now we have found multiple ways to solve the problem, and > this should do well for anyone wanting to use geev. But in the > following I argue that we still haven't found the real issue. > [snip nice discussion] I agree. We haven't found the real issue. However, since the LAPACK guys didn't change dlamch, I am reluctant to modify it. Therefore, I think the right course is to do what the FAQ suggests and compile dlamch without optimization. Ray |
From: Thibault L. <tl...@di...> - 2005-08-10 11:52:43
|
On Mon, 2005-08-08 at 22:15 -0500, Paul Ledbetter III wrote: > I would like to know if the patch at the bottom of this email solves > the geev problem for the other people who have reported it. It solves > mine. > > I have been having the same issue with geev. Calling it causes cmucl > to go into an infinite loop. In slime I can interrupt the program, > and then call geev again. The second time no infinite loop occurs, > but the results are corrupted (i.e. The resulting matrices are > populated with silent-NaNs). > > I have tracked the source of these problems down to a heisenbug in the > subroutines which determine parameters of the machine. Unfortunately, > this patch could not possibly affect the real cause of the problem, > but if it works for other people besides me, it may help lead to the > real problem. > > The logical effect of this patch is to cause the LAPACK routines to > assume the base of the machine is at least binary. > > Index: dlamch.f > =================================================================== > RCS file: /cvsroot/matlisp/matlisp/LAPACK/SRC/dlamch.f,v > retrieving revision 1.1 > diff -u -w -r1.1 dlamch.f > --- dlamch.f 15 Apr 2000 00:24:53 -0000 1.1 > +++ dlamch.f 9 Aug 2005 02:47:52 -0000 > @@ -231,7 +231,7 @@ > * > * fl( a + b ) .gt. a. > * > - B = 1 > + B = 2 > C = DLAMC3( A, B ) > * > *+ WHILE( C.EQ.A )LOOP > It works for me but this one works too :-/ --- ../matlisp-2_0beta-2003-10-14/LAPACK/SRC/dlamch.f 2000-04-15 01:24:53.000000000 +0100 +++ LAPACK/SRC/dlamch.f 2005-08-10 12:13:27.167199885 +0100 @@ -234,6 +234,7 @@ B = 1 C = DLAMC3( A, B ) * + PRINT *, '' *+ WHILE( C.EQ.A )LOOP 20 CONTINUE IF( C.EQ.A ) THEN The bug is more likely to be in gcc/g77/glibc or in the way the libraries are called from lisp. Here are the versions I use: wok$ gcc --version gcc (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.fc3) Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. wok$ g77 --version GNU Fortran (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.fc3) Copyright (C) 2004 Free Software Foundation, Inc. GNU Fortran comes with NO WARRANTY, to the extent permitted by law. You may redistribute copies of GNU Fortran under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING or type the command `info -f g77 Copying'. wok$ rpm -q glibc glibc-2.3.3-74 Thibault |
From: Paul L. I. <aea...@gm...> - 2005-08-10 12:35:35
|
My email probably seemed somewhat strange coming after yours, since we came to similar conclusions. My first email was delayed by more than a day(for moderation purposes?) and you verified what I was asking for before it could arrive :) As I think we have both found, just about any change in the code before that loop causes dlamch to behave correctly. I chose to set B=3D2 because it's a trivial improvement to the LAPACK code(it eliminates an unnecessary loop iteration). I have witnessed the error on two computers, one with a celeron cpu, and the other with a pentium III. In both cases I am using cmucl-19b on FreeBSD 5.4. My f77 on both is 3.4.2. This bug is annoying. I hope this helps. Paul |
From: Raymond T. <rt...@ea...> - 2005-08-10 13:36:24
|
Paul Ledbetter III wrote: > I would like to know if the patch at the bottom of this email solves > the geev problem for the other people who have reported it. It solves > mine. [snip] > The logical effect of this patch is to cause the LAPACK routines to > assume the base of the machine is at least binary. > Ah, thanks for digging into this. A little googling finds the LAPACK FAQ (http://www.netlib.org/lapack/faq.html) which indicates that dlamch (and its dependent routines dlamc[1-5]) should be compiled without optimization? Can you compile dlamch.f by hand without optimization and see if that makes a difference for you? I'd really appreciate if you could, since I don't seem to have this problem on my linux boxes. Thanks! Ray |
From: Thibault L. <tl...@di...> - 2005-08-10 13:59:59
|
On Wed, 2005-08-10 at 09:32 -0400, Raymond Toy wrote: > Paul Ledbetter III wrote: > > I would like to know if the patch at the bottom of this email solves > > the geev problem for the other people who have reported it. It solves > > mine. > [snip] > > The logical effect of this patch is to cause the LAPACK routines to > > assume the base of the machine is at least binary. > > > > Ah, thanks for digging into this. A little googling finds the LAPACK > FAQ (http://www.netlib.org/lapack/faq.html) which indicates that dlamch > (and its dependent routines dlamc[1-5]) should be compiled without > optimization? Can you compile dlamch.f by hand without optimization and > see if that makes a difference for you? > > I'd really appreciate if you could, since I don't seem to have this > problem on my linux boxes. > > Thanks! > > Ray Bingo ! It works. After "make" I did g77 -g -o LAPACK/SRC/dlamch.o -c LAPACK/SRC/dlamch.f then make again and it worked. Thibault |