Menu

errors setting up slurm - missing library

2018-03-14
2018-03-15
  • Christophe Guilbert

    Hi ,
    I am trying to have slurm working on Rocks 7.0.
    Installation went Okay.

    Here is the error message to rocks report slurm_hwinfo:
    slurmd: error while loading shared libraries: libltdl.so.7: cannot open shared object file: No such file or directory
    slurmd: error while loading shared libraries: libltdl.so.7: cannot open shared object file: No such file or directory
    slurmd: error while loading shared libraries: libltdl.so.7: cannot open shared object file: No such file or directory

    Also when trying to rocks sync slurm
    rocks sync slurm
    compute-0-1: Job for slurmd.service failed because the control process exited with error code. See "systemctl status slurmd.service" and "journalctl -xe" for details.
    pdsh@jcluster: compute-0-1: ssh exited with exit code 1
    compute-0-2: Job for slurmd.service failed because the control process exited with error code. See "systemctl status slurmd.service" and "journalctl -xe" for details.
    pdsh@jcluster: compute-0-2: ssh exited with exit code 1
    compute-0-0: Job for slurmd.service failed because the control process exited with error code. See "systemctl status slurmd.service" and "journalctl -xe" for details.
    pdsh@jcluster: compute-0-0: ssh exited with exit code 1

    if I do "systemctl status slurmd.service" on compute-0-1, I have:
    Starting Slurm node daemon...
    Mar 14 18:14:27 compute-0-1.local slurmd[1774]: /usr/sbin/slurmd: error while loading shared libraries: libltdl.so.7: cannot open shared obje...irectory
    Mar 14 18:14:27 compute-0-1.local systemd[1]: slurmd.service: control process exited, code=exited status=127
    Mar 14 18:14:27 compute-0-1.local systemd[1]: Failed to start Slurm node daemon.
    Mar 14 18:14:27 compute-0-1.local systemd[1]: Unit slurmd.service entered failed state.
    Mar 14 18:14:27 compute-0-1.local systemd[1]: slurmd.service failed.
    Hint: Some lines were ellipsized, use -l to show in full.

    It looks like lib libltdl.so is missing and mess up everything.
    Here it is however
    find / -name "libltdl.so*"
    /usr/lib64/libltdl.so.7.3.0
    /usr/lib64/libltdl.so.7
    /opt/condor/lib/condor/libltdl.so.7

    Last update/edit , on nodes (e.g compute-0-1), libltdl.so is only found in /opt/condor/lib/condor/libltdl.so.7

    Any idea ?

    Thanks

     

    Last edit: Christophe Guilbert 2018-03-14
    • Werner Saar

      Werner Saar - 2018-03-15

      Hi,

      I don't have installed condor, so there is

      no such line:

      /opt/condor/lib/condor/libltdl.so.7

      You have to configure the system, so that slurmd loads the library from

      /usr/lib64

      Best regards
      Werner

      On 03/14/2018 11:20 PM, Christophe Guilbert wrote:

      Hi ,
      I am trying to have slurm working on Rocks 7.0.
      Installation went Okay.

      Here is the error message to rocks report slurm_hwinfo:
      slurmd: error while loading shared libraries: libltdl.so.7: cannot open shared object file: No such file or directory
      slurmd: error while loading shared libraries: libltdl.so.7: cannot open shared object file: No such file or directory
      slurmd: error while loading shared libraries: libltdl.so.7: cannot open shared object file: No such file or directory

      Also when trying to rocks sync slurm
      rocks sync slurm
      compute-0-1: Job for slurmd.service failed because the control process exited with error code. See "systemctl status slurmd.service" and "journalctl -xe" for details.
      pdsh@jcluster: compute-0-1: ssh exited with exit code 1
      compute-0-2: Job for slurmd.service failed because the control process exited with error code. See "systemctl status slurmd.service" and "journalctl -xe" for details.
      pdsh@jcluster: compute-0-2: ssh exited with exit code 1
      compute-0-0: Job for slurmd.service failed because the control process exited with error code. See "systemctl status slurmd.service" and "journalctl -xe" for details.
      pdsh@jcluster: compute-0-0: ssh exited with exit code 1

      if I do "systemctl status slurmd.service" on compute-0-1, I have:
      Starting Slurm node daemon...
      Mar 14 18:14:27 compute-0-1.local slurmd[1774]: /usr/sbin/slurmd: error while loading shared libraries: libltdl.so.7: cannot open shared obje...irectory
      Mar 14 18:14:27 compute-0-1.local systemd[1]: slurmd.service: control process exited, code=exited status=127
      Mar 14 18:14:27 compute-0-1.local systemd[1]: Failed to start Slurm node daemon.
      Mar 14 18:14:27 compute-0-1.local systemd[1]: Unit slurmd.service entered failed state.
      Mar 14 18:14:27 compute-0-1.local systemd[1]: slurmd.service failed.
      Hint: Some lines were ellipsized, use -l to show in full.

      It looks like lib libltdl.so is missing and mess up everything.
      Here it is however
      find / -name "libltdl.so*"
      /usr/lib64/libltdl.so.7.3.0
      /usr/lib64/libltdl.so.7
      /opt/condor/lib/condor/libltdl.so.7

      Any idea ?

      Thanks


      errors setting up slurm - missing library


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
      • Christophe Guilbert

        Thanks for the answer Werner but how do you do that ? , it seems to me that Slurm roll for rocks should take care of it. correct ?

         
        • Werner Saar

          Werner Saar - 2018-03-15

          I think, that the condor roll should not ship

          a shared library, that is still present in the system

          On 03/15/2018 08:02 AM, Christophe Guilbert wrote:

          Thanks for the answer Werner but how do you do that ? , it seems to me that Slurm roll for rocks should take care of it. correct ?


          errors setting up slurm - missing library


          Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/

          To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.