Menu

Install elk with MPI

Elk Users
Stefano
2015-08-14
2015-08-16
  • Stefano

    Stefano - 2015-08-14

    Hi,

    I'm trying to install elk with MPI. This is my make.inc file

    MAKE = make
    F90_OPTS = -O3 -unroll -openmp
    F77_OPTS = -O3 -unroll -openmp
    AR = ar
    LIB_LPK = lapack.a blas.a
    LIB_FFT = fftlib.a
    LIB_SYS = -L/opt/openmpi/lib/ -lmpi
    SRC_MPI = mpi_stub.f90
    F90 = /opt/openmpi/bin/mpif90
    F77 = /opt/openmpi/bin/mpif77

    During the compilation the .o files are not created so I get errors of this kind

    ar: dgesv.o: No such file or directory
    make[2]: * [lapack] Error 1
    make[2]: Leaving directory /home/agr06/elk_versions/elk-3.0.18_EET_parallel/src/LAPACK' cp: cannot statlapack.a': No such file or directory
    make[1]: * [lapack] Error 1
    make[1]: Leaving directory `/home/agr06/elk_versions/elk-3.0.18_EET_parallel/src'
    make: * [all] Error 2

    What is the correct way to instal elk with mpi correctly?
    I also tried to replace -openmp with -fopenmp. I do not get any error during the compilation in this case but when I run elk with

    export OMP_NUM_THREADS=$ppn
    /opt/openmpi/bin/mpirun -pernode -np $nodes ./elk > LOG.OUT

    I get the following error


    Your job has requested a conflicting number of processes for the
    application:

    App: ./elk
    number of procs: 4

    This is more processes than we can launch under the following
    additional directives and conditions:

    number of nodes: 1
    npernode: 1

    Any help would be appreciated.

    Cheers,
    Stefano

     
  • martin_frbg

    martin_frbg - 2015-08-14

    Your first attempt at compilation had the compiler options set up for using the (commercial) Intel compiler (ifort), if it works with "-fopenmp" instead of "-openmp" you are probably using the (free) GNU compiler (gfortran).
    What are your "$ppn" and "$nodes" options in the OMP_NUM_THREADS and mpirun calls ?
    The earlier elk wiki pages that - I believe - had some tips on mpi/mp runs appear to have gone missing (or at least I failed to find them), but there are some earlier discussions on this
    here such as http://sourceforge.net/p/elk/discussion/897820/thread/0034c359 and http://sourceforge.net/p/elk/discussion/897820/thread/15bd832f that may still be useful.

     

    Last edit: martin_frbg 2015-08-14
  • martin_frbg

    martin_frbg - 2015-08-15

    Sorry, did not notice this at first: in your make.inc, you still have to make the SRC_MPI definition empty - the "stub" is just a placeholder for real mpi functions when you are not building for mpi

     
  • Stefano

    Stefano - 2015-08-15

    Hi,

    thanks for the reply. Now I'm compiling with this make.inc file

    MAKE = make
    F90_OPTS = -O3 -unroll -fopenmp
    F77_OPTS = -O3 -unroll -fopenmp
    AR = ar
    LIB_SYS =
    LIB_LPK = lapack.a blas.a
    LIB_FFT = fftlib.a
    SRC_MPI =
    F90 = /opt/openmpi/bin/mpif90
    F77 = /opt/openmpi/bin/mpif90

    The compilation seems fine, no errors. But then when I try to run the code, with a PBS script which contain the following commands

    ...
    export OMP_NUM_THREADS=16
    /opt/openmpi/bin/mpirun -pernode -np 4 ./elk > LOG.OUT
    ...

    I get the following message


    Your job has requested a conflicting number of processes for the
    application:

    App: ./elk
    number of procs: 4

    This is more processes than we can launch under the following
    additional directives and conditions:

    number of nodes: 1
    npernode: 1

    Please revise the conflict and try again.

    Any idea of the reason? How can I solve it?

    Stefano

     
  • martin_frbg

    martin_frbg - 2015-08-16

    This is a conflict between the "-pernode" (run only one job per "node") and "-np 4" options. Seems you are running this on just one computer with a single quad-core cpu, and openmpi`s concept of a "node" is more like "desktop box", i.e. everything that has direct access to the same memory.

     
  • Stefano

    Stefano - 2015-08-16

    I am running to code with a PBS script on a cluster with 4 nodes x 16 cores per node.
    If I use the hybrid MPI+openmp I get the previous error but if I run the code with just mpi it seems fine. I use

    /opt/openmpi/bin/mpirun -np 64 ./elk > LOG.OUT

    and Elk tells me that I am running 64 processes

    Using MPI, number of processes : 64

    in INFO.OUT file.

    But it seems that the run is slower than a run made with only openmp parallelization on a different single node with 40 cores.
    Does it depend on the way in which elk is parallelized (openmp is more efficient than mpi) or could it depend on the configuration of the cluster I am using? I am running task 300 in which the parallelization is mainly on k-points I guess.

    Would I get any significant advantage in efficiency in using hybrid mpi+openmp rather than just mpi?

     
  • martin_frbg

    martin_frbg - 2015-08-16

    Okay, your hardware is bigger than what I have currently available :-)
    Still I guess that the error comes not from the mpi+openmp as such, but from inadvertent use of the "-pernode" command together with "-np". If you wanted it to run 4 mpi jobs per node, I guess you would use "-npernode 4". Or just go with "-pernode" without the "-np 4" if you wanted one job (with 16 OMP threads) per node unless I misread the manual page for mpirun ?
    From my (so far limited) understanding, mpi is inferior to openmp when running on just a single node, but you absolutely need mpi to distribute a calculation across nodes, so the hybrid should provide the best performance.

     
  • Stefano

    Stefano - 2015-08-16

    yes you are right. I removed the -pernode and now I am running the code with mpi+openmp and it seems faster than the pure mpi.

    Thanks for the help

     

Log in to post a comment.