Menu

How to run elk (mpi) on slurm cluster?

Elk Users
2020-05-11
2020-05-12
  • guy19@mails.tsinghua.edu.cn

    Dear All.

    I want to run elk on slurm cluster, and I have compiled elk code with mpi. When I use command "srun -N 1 -n 1 -c 24 ~/elk-6.3.2/src/elk elk.in" to submit the job, it can run successfully. However, when I

    use "srun -N 2 -n 2 -c 24" to try to use two nodes, the job will give an error report (here yhrun is equivalent to srun):

    yhrun: error: slurm_receive_msg: Socket timed out on send/recv operation

    yhrun: Job step creation temporarily disabled, retrying

    yhrun: Job step created

    forrtl: No such file or directory

    forrtl: severe (28): CLOSE error, unit 95, file "Unknown"

    Image PC Routine Line Source

    elk 0000000002034CD8 for__io_return Unknown Unknown

    elk 0000000002032BB6 for_close Unknown Unknown

    elk 000000000043A42B Unknown Unknown Unknown

    elk 000000000043B936 Unknown Unknown Unknown

    elk 000000000042971E Unknown Unknown Unknown

    libc-2.12.so 0000003921A1ED1D __libc_start_main Unknown Unknown

    elk 0000000000429629 Unknown Unknown Unknown

    yhrun: error: cn10342: task 1: Exited with exit code 28

    yhrun: First task exited 60s ago

    yhrun: task 0: running

    yhrun: task 1: exited abnormally

    yhrun: Terminating job step 14634421.0

    slurmd[cn10341]: STEP 14634421.0 KILLED AT 2020-05-11T21:50:57 WITH SIGNAL 9

    yhrun: Job step aborted: Waiting up to 2 seconds for job step to finish.

    yhrun: error: cn10341: task 0: Killed

    Can someone point out what mistake I am making? Thank you very much!

    Regards,
    Y. Gu

     
  • Andrew Shyichuk

    Andrew Shyichuk - 2020-05-12

    Dear Y.Gu,

    From what I know, elk executable does not handle MPI submission.
    I.e. it should be executed as "mpirun -nc 2 elk elk.in" other than just ./elk elk.in.
    I recommend checking this post for more.

    Thus, you should probably do something like:
    srun -N 2 -n 2 -c 24 mpirun -nc 2 ~/elk-6.3.2/src/elk elk.in

    That, however, depends on how to properly submit multinode / MPI jobs on that particular server, mpirun is just one possible option.

    Also, just to be sure, check if your 1-node task does indeed use all 24 threads. If that is fine - you've got the openMP part right.

    Good luck!
    Andrew.

     

Log in to post a comment.