#903 Lapack routine zgesdd appears non-thread-safe

closed-out-of-date
None
4
2014-07-09
2013-04-19
No

I am having some problems which look very much like a call to ZGESDD is not thread safe, when linking against lapack and BLAS. I am calling the routine ZGESDD within an OPENMP parallel section, and see that if I run with multiple threads, this call produces garbage.

I have checked:

  • The problem goes away if the call to ZGESDD is placed in an !$OMP CRITICAL section, to avoid multiple threads calling this routine at the same time.

  • The problem goes away if I link against ACML rather than ATLAS.

  • The problem goes away if I replace the ZGESDD call with ZGESVD (which is supposed to do the same thing, but I understand ZGESVD is generally slower).

The problem was first encountered with the version of atlas distributed by ubuntu, but remains when I recompile ATLAS myself (although there appeara to be cases where I can run with more threads in this case, but still hit a race condition for enough threads, this may however just be effects of how long the function call takes affecting whether the race condition is realised).

This is using ATLAS 3.10.0 as downloaded from sourceforge, and the Lapack 3.4.2 tar file for Lapack, on a four core Linux machine, with Kernel 3.2.0-40-generic #64-Ubuntu SMP.

Many thanks in advance for any help with this.

Discussion

  • R. Clint Whaley

    R. Clint Whaley - 2013-04-19

    I believe you are just linking to the wrong library. ATLAS builds both a parallel and serial library. If you are doing the parallelism yourself, then you should link to the serial interface. If you are writing serial code and want ATLAS to handle the parallelism, then you should link to the parallel interfaces.

    The parallel interfaces use static NCPU-length arrays to keep track of thread info, and that is indeed not thread safe if multiple guys call it. However, you don't ever want to do that, because it would be a performance disaster anyway.

    So, just link to the serial lib if you are already running in parallel!

    Cheers,
    Clint

     
  • R. Clint Whaley

    R. Clint Whaley - 2013-04-19
    • status: open --> closed-works-for-me
     
  • Jonathan Keeling

    Thanks for your help, however I'm as sure as I can be that I'm linking the serial libraries; I was linking
    -llapack -lf77blas -lcblas -latlas

    Just to be sure I've recompiled atlas with:

    ../configure -t 0 --with-netlib-lapack-tarfile=....

    which I think should ensure only the serial library is compiled and the same behaviour persists.

    (If I link against the pt version I do get it breaking immediately as soon as there are multiple threads; the problem I am seeing is not as repeatable, I only see the race condition for large enough numbers of loops, not as soon as parallel threads are invoked.

     
  • R. Clint Whaley

    R. Clint Whaley - 2013-04-23
    • status: closed-works-for-me --> open
     
  • R. Clint Whaley

    R. Clint Whaley - 2013-04-23

    Yes, that looks like the right link line. Can you verify this is an ATLAS install from sourceforge (not repackaged by others)? If so, also hardwire in the link information (including paths) rather than using -l, to make sure you are getting the libs you think.

    If the serial interface is not thread safe, there is a problem. However, the other thing is to make sure your invocation is thread safe. For instance, ZGESDD takes a workspace, and if you pass in a shared workspace on all threads, then of course it will fail. So, do you have the threads all passing all arguments as private variables with separate storage, particularly of work? The work mess-up might be why other libs work: many modern libs allocate their own space and ignore work, but LAPACK does not.

    Let me know,
    Clint

     
  • R. Clint Whaley

    R. Clint Whaley - 2014-07-09
    • status: open --> closed-out-of-date
    • assigned_to: R. Clint Whaley
     
  • R. Clint Whaley

    R. Clint Whaley - 2014-07-09

    No response, closing. If there really is a problem, then I think the place to report it is to the lapack folks, since ATLAS does not provide GESDD natively.

     

Log in to post a comment.