Menu

Unable to submit job

2013-10-29
2013-10-30
  • Martin Brodbeck

    Martin Brodbeck - 2013-10-29

    I'm not sure if this is a question for this forum here or if it should rather go to a slurm forum. But I've installed a new cluster with Rocks 6.1 and the latest slurm roll. Everything went fine with the installation. But now I try to submit a job with sbatch and get the following error:

    sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure)

    Now, having a look at slurmctld.log gives me that:

    fatal: It appears you don't have any association data from your database. The priority/multifactor plugin requires this information to run correctly. Please check your database connection and try again.

    I've done nothing else with the cluster so far. So I was expecting that it just runs out of the box. But maybe I missed something?

     
  • Martin Brodbeck

    Martin Brodbeck - 2013-10-29

    It seems that a "sacctmgr add cluster myname" has solved the problem. But I wonder why that was necessary and if the cluster is now well-configured. :)

     

    Last edit: Martin Brodbeck 2013-10-29
    • Werner Saar

      Werner Saar - 2013-10-29

      On 29.10.2013 16:34, Martin Brodbeck wrote:

      It seems that a "sacctmgr add cluster <name>" has solved the problem. But I wonder why that was necessary and if the cluster is now well-configured. :)</name>


      Unable to submit job


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
      Hi,

      this command is executed when you run "rocks run roll slurm|sh".
      Did you saw an error, when You ran this command?

      Best regards

      Werner

       
  • Martin Brodbeck

    Martin Brodbeck - 2013-10-30

    I've added the slurm roll at the very beginning in the installation process. That is, I added the slurm roll together with base, kernel, os and so on. I thought that the installation steps are then performed automatically just as if I had installed a different roll like torque...
    So, no, there wasn't an error message, but maybe I missed it because the installation process took place in the backgound?

    Thanks,
    Martin

     

    Last edit: Martin Brodbeck 2013-10-30
    • Werner Saar

      Werner Saar - 2013-10-30

      On 30.10.2013 14:33, Martin Brodbeck wrote:

      I've added the slurm roll at the very beginning in the installation process. That is, I added the slurm roll together with base, kernel, os and so on. I thought that the installation steps are then performed automatically just as if I had installed a different roll like torque...
      So, no, there wasn't an error message, but maybe I missed it because the installation process took place in the backgound?


      Unable to submit job


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
      Hi,

      Sorry. Please run this command:

      rocks run roll slurm > /tmp/slurm.script.

      You will find the following lines:

      service slurmdbd start
      sleep 60
      sacctmgr -i create cluster $CLUSTER
      sleep 20
      service slurm start

      You see, that I wait 60 seconds after starting slurmdbd, this was always
      enough in my tests. But if this time is too short
      the command to create the cluster will fail. I think, that this was the
      reason for the failure.

      I will try to find a better solution.

      Thank You for your help.

      Best regards

      Werner

       
  • Martin Brodbeck

    Martin Brodbeck - 2013-10-30

    Thanks, Werner. So, do you guess that this "sacctmgr add cluster myname" was enough I had to to in oder to fix the slurm installation? It seems that everything is working well, though...

     
    • Werner Saar

      Werner Saar - 2013-10-30

      On 30.10.2013 16:40, Martin Brodbeck wrote:

      Thanks, Werner. So, do you guess that this "sacctmgr add cluster myname" was enough I had to to in oder to fix the slurm installation? It seems that everything is working well, though...


      Unable to submit job


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
      Hi,

      when You install the headnode, You have to give the name of the cluster.
      The command "rocks run roll slurm|sh" writes the name of the cluster to
      the file /etc/slurm/headnode.conf

      If the name of your cluster is myname, then:

      sacctmgr -i add cluster myname

      is enough and all is o.k.

      Best regards

      Werner

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.