Menu

OpenMP run (2)

Vlad
2006-03-14
2013-06-05
  • Vlad

    Vlad - 2006-03-14

    Dear all, dear Kay,

    last year the problem of porting the EXCITING code to IBM Power4 was discussed.

    I would like to ask if anybody has meanwhile succeeded to port the code for OpenMP runs on IBM machines?

    I'm trying currently to port it on Power5, but face seemingly the same problem as Sahu described(see previous forum thread). The suggested ways to solve it, inreasing stack space, do not actually help.

    Some details about my try:
    1) I have compiled the program EXCITING using the following "make.inc":
    -------------
    F90 = xlf90_r -qsuffix=f=f90 -qsmp=omp,noauto -qnosave
    F90_OPTS = -O3 -qarch=pwr5 -qtune=pwr5 -qstrict
    F77 = xlf_r
    F77_OPTS = -O3 -qarch=pwr5 -qtune=pwr5 -qstrict
    LIB_LPK = -L/afs/rzg/@sys/lib -llapack-essl -lesslsmp
    LIB_FFT = fftlib.a -L/afs/rzg/@sys/lib -lfftw3_threads
    --------------
    Also, as an alternative I have tried to compile the program in 64-bit mode, adding -q64 flags in make.inc and -X64 to ar command in "Makefile" files. This should increase the usable total stack space up to 2 GB.

    Both compilations works without any errors, BUT only if running ONE thread. If I start e.g. 8 threads for 8 CPUs with the input from Si example (8 k-points), distributed with the program, I get "segmentation fault" an the program stops in the first SCF cycle.

    2) I have tried to increase the stack space for the threads up to the possible maximum with XLSMPOPTS variable, when running, or with -bmaxstack flag, when compiling, but it doesn't help.

    3) The sequential compilation works on Power5 without any error, as also Sahu previously reported.

    I will be grateful for any suggestions...

    Regards,
    Vlad

     
    • exciting

      exciting - 2006-03-14

      Dear Vlad

      I'm really stumped with this one! I've checked the OpenMP directives very carefully and all seems well. Unfortunately, I don't have access to a Power5 machine, but the code works fine in parallel mode on an 4 processor Itanium.

      Perhaps you could try removing the !$OMP directives in the routine gndstate, first for the secular equation solver and then the charge density calculation, and see what happens.

      Cheers
      Kay.

       
      • Christian Meisenbichler

        I can confirm that there are problems with the xlf90_r but i have no experience at all with omp.

        but really i do not see where the thread save problem in something like lapack should be

         
      • Christian Meisenbichler

        OMP somehow worked on power5 with this make.inc

        >>>
        F90 = xlf90_r
        F90_OPTS = -q64 -O3 -g -qsmp=omp,noauto -qnosave -qstrict   #-WF,-DMPI,-DMPIRHO #,-DMPIIO

        F77 = xlf_r
        F77_OPTS =-qddim -q64 -O3  -g -qsmp=omp,noauto -qnosave -qstrict

        LIB_SYS =
        LIB_LPK = lapack.a blas.a
        LIB_FFT = fftlib.a
        <<<

         
        • Christian Meisenbichler

          ... #-WF,-DMPI,-DMPIRHO #,-DMPIIO is here of course only a comment

           
    • Vlad

      Vlad - 2006-03-18

      Dear Kay,

      I've removed just the !$OMP directive for the secular equation solver in the routine gndstate, and the recompiled program has worked fine (started with 8 threads), without any errors.

      The wall time for the run is, however, almost the same as for the sequential run.

      Please give further suggestions...
      Thank you for your help!

      Regards,
      Vlad

       
    • exciting

      exciting - 2006-03-19

      Dear Vlad

      Thanks for the info: we now know that you can allocate arrays inside a parallel loop. It's possible therefore that that your LAPACK library is not thread-safe. Could you try compiling with the native BLAS/LAPACK routines?

      Best wishes
      Kay.

       
    • Vlad

      Vlad - 2006-03-24

      Dear Kay,

      actually, I have tried already to compile with the native BLAS/LAPACK libraries, as I faced the problem for the first time. It didn't work.

      Anyway, I have checked it once again, compiling with:
      ----------------------
      F90 = xlf90_r -qsuffix=f=f90 -qsmp=omp,noauto -qnosave
      F90_OPTS = -O3 -qarch=pwr5 -qtune=pwr5 -qstrict 
      F77 = xlf_r
      F77_OPTS = -O3 -qarch=pwr5 -qtune=pwr5 -qstrict 
      LIB_LPK = lapack.a blas.a
      LIB_FFT = fftlib.a
      ----------------------
      and alternatively adding -q64, to build 64-bit code.

      64-bit compilation stops, as usually, in the first SCF cycle with a "segmentation fault" error.
      32-bit compilation crashes at the very beginning of the run, not creating any *.OUT file and reporting "1587-120 SMP runtime library error. Memory allocation failed when creating thread number 4".

      For 32-bit compilation I have tried alternatively compiling with -bmaxstack:256000000 and XLSMPOPTS="stack=64000000", to increase stack space. This doesn't help, however...

      Regards,
      Vlad

       
    • Vlad

      Vlad - 2006-03-24

      P.S. I mean, not compiling, but running with XLSMPOPTS...

      Cheers,
      Vlad

       

Log in to post a comment.