Hi Paul,

On Thu, Jul 4, 2013 at 1:14 AM, Paul Khuong <pvk@pvk.ca> wrote:
Mark Cox wrote:
G'day,

I am having a hard time diagnosing a DIVIDE-BY-ZERO condition when I
invoke the foreign function DGESVD (LAPACK) on OSX with arguments in [1].

FWIW, I'm told DGESDD should be the default choice nowadays.


What is making the problem difficult to diagnose is:
- the problem does not occur on FreeBSD or Linux.
- the problem does not occur if I call the same function with arguments
[2] on exactly the same  input.
- the problem does not occur on OSX when using ECL or CCL.
- the problem does not occur on OSX when I call the function in a
different thread.

The one thing I can think of is the peculiarities of the way FP traps are handled on OS X (last I checked, it looked like there's plenty of blame to go around).  You could try

(sb-int:with-float-traps-masked (:overflow :underflow :inexact :invalid :divide-by-zero) [foreign call here])

Stas Boukarev spoke to me offline and suggested I try that. The problem goes away if I mask :invalid and :divide-by-zero traps.

The reason why the error was not signalled on a different thread is because the traps are not enabled on the other thread. i.e.
(sb-vm::current-float-trap :invalid :divide-by-zero) returns T on the main thread and NIL on a different thread. I am not sure if this behaviour is on purpose.

As an aside, I checked the state of CCL::GET-FPU-MODE in Clozure and it states that floating point exceptions are enabled for :invalid and :divide-by-zero exceptions.

Anyway, I have directions in which to move now.

Paul and Stas, thank you for your time.
Mark