Menu

sacctmgr error.

2020-03-28
2020-03-28
  • Mahmoud Abdelwahab

    Hello all,

    upon doing fresh rocks 7.0 installation then installed a compute nodoe,
    Installed rolls:

    NAME VERSION ARCH ENABLED
    base: 7.0 x86_64 yes
    CentOS: 7.4.1708 x86_64 yes
    core: 7.0 x86_64 yes
    ganglia: 7.0 x86_64 yes
    hpc: 7.0 x86_64 yes
    kernel: 7.0 x86_64 yes
    Updates-CentOS-7.4.1708: 2017-12-01 x86_64 yes

    after trying to install the latest slurm version I get this error :

    Created symlink from /etc/systemd/system/multi-user.target.wants/slurmdbd.service to /usr/lib/systemd/system/slurmdbd.service.
    Created symlink from /etc/systemd/system/multi-user.target.wants/mariadb.service to /usr/lib/systemd/system/mariadb.service.
    sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to 127:6819: Invalid argument
    sacctmgr: error: slurmdbd: Sending PersistInit msg: Invalid argument
    sacctmgr: error: Problem talking to the database: Invalid argument
    sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to 127:6819: Invalid argument
    sacctmgr: error: slurmdbd: Sending PersistInit msg: Invalid argument
    sacctmgr: error: Problem talking to the database: Invalid argument
    sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to 127:6819: Invalid argument
    sacctmgr: error: slurmdbd: Sending PersistInit msg: Invalid argument
    sacctmgr: error: Problem talking to the database: Invalid argument
    sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to 127:6819: Invalid argument
    sacctmgr: error: slurmdbd: Sending PersistInit msg: Invalid argument
    sacctmgr: error: Problem talking to the database: Invalid argument
    sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to 127:6819: Invalid argument
    sacctmgr: error: slurmdbd: Sending PersistInit msg: Invalid argument
    sacctmgr: error: Problem talking to the database: Invalid argument
    sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to 127:6819: Invalid argument
    sacctmgr: error: slurmdbd: Sending PersistInit msg: Invalid argument
    sacctmgr: error: Problem talking to the database: Invalid argument
    Created symlink from /etc/systemd/system/multi-user.target.wants/slurmctld.service to /usr/lib/systemd/system/slurmctld.service.
    ########################################################################
    # WARNING: The command:                                                #
    #                                                                      #
    # sacctmgr -i create cluster rocks.vm                                  #
    #                                                                      #
    # failed. Please run this command again                                #
    ########################################################################
    

    the output file is available here : https://pastebin.com/2s1uSJeJ

     
  • Mahmoud Abdelwahab

    upon looking into slurmctld status I get his error:

    systemctl status slurmctld.service
     slurmctld.service - Slurm controller daemon
       Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
       Active: failed (Result: resources) since Sat 2020-03-28 14:20:33 SAST; 3min 35s ago
      Process: 3370 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)
     Main PID: 1395 (code=exited, status=1/FAILURE)
    
    Mar 28 14:20:33 127.0.0.1 systemd[1]: Starting Slurm controller daemon...
    Mar 28 14:20:33 127.0.0.1 systemd[1]: PID file /var/run/slurmctld.pid not readable (yet?) after start.
    Mar 28 14:20:33 127.0.0.1 systemd[1]: slurmctld.service never wrote its PID file. Failing.
    Mar 28 14:20:33 127.0.0.1 systemd[1]: Failed to start Slurm controller daemon.
    Mar 28 14:20:33 127.0.0.1 systemd[1]: Unit slurmctld.service entered failed state.
    Mar 28 14:20:33 127.0.0.1 systemd[1]: slurmctld.service failed.
    
     
  • Mahmoud Abdelwahab

    Foud a fix by installing : slurm-7.0.0.193-18.08.08.00.00.x86_64 then updating to the latest version.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.