I'm not sure if this is a question for this forum here or if it should rather go to a slurm forum. But I've installed a new cluster with Rocks 6.1 and the latest slurm roll. Everything went fine with the installation. But now I try to submit a job with sbatch and get the following error:
Now, having a look at slurmctld.log gives me that:
fatal: It appears you don't have any association data from your database. The priority/multifactor plugin requires this information to run correctly. Please check your database connection and try again.
I've done nothing else with the cluster so far. So I was expecting that it just runs out of the box. But maybe I missed something?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It seems that a "sacctmgr add cluster myname" has solved the problem. But I wonder why that was necessary and if the cluster is now well-configured. :)
Last edit: Martin Brodbeck 2013-10-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It seems that a "sacctmgr add cluster <name>" has solved the problem. But I wonder why that was necessary and if the cluster is now well-configured. :)</name>
I've added the slurm roll at the very beginning in the installation process. That is, I added the slurm roll together with base, kernel, os and so on. I thought that the installation steps are then performed automatically just as if I had installed a different roll like torque...
So, no, there wasn't an error message, but maybe I missed it because the installation process took place in the backgound?
Thanks,
Martin
Last edit: Martin Brodbeck 2013-10-30
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've added the slurm roll at the very beginning in the installation process. That is, I added the slurm roll together with base, kernel, os and so on. I thought that the installation steps are then performed automatically just as if I had installed a different roll like torque...
So, no, there wasn't an error message, but maybe I missed it because the installation process took place in the backgound?
service slurmdbd start
sleep 60
sacctmgr -i create cluster $CLUSTER
sleep 20
service slurm start
You see, that I wait 60 seconds after starting slurmdbd, this was always
enough in my tests. But if this time is too short
the command to create the cluster will fail. I think, that this was the
reason for the failure.
I will try to find a better solution.
Thank You for your help.
Best regards
Werner
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks, Werner. So, do you guess that this "sacctmgr add cluster myname" was enough I had to to in oder to fix the slurm installation? It seems that everything is working well, though...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks, Werner. So, do you guess that this "sacctmgr add cluster myname" was enough I had to to in oder to fix the slurm installation? It seems that everything is working well, though...
when You install the headnode, You have to give the name of the cluster.
The command "rocks run roll slurm|sh" writes the name of the cluster to
the file /etc/slurm/headnode.conf
If the name of your cluster is myname, then:
sacctmgr -i add cluster myname
is enough and all is o.k.
Best regards
Werner
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not sure if this is a question for this forum here or if it should rather go to a slurm forum. But I've installed a new cluster with Rocks 6.1 and the latest slurm roll. Everything went fine with the installation. But now I try to submit a job with sbatch and get the following error:
Now, having a look at slurmctld.log gives me that:
I've done nothing else with the cluster so far. So I was expecting that it just runs out of the box. But maybe I missed something?
It seems that a "sacctmgr add cluster myname" has solved the problem. But I wonder why that was necessary and if the cluster is now well-configured. :)
Last edit: Martin Brodbeck 2013-10-29
On 29.10.2013 16:34, Martin Brodbeck wrote:
this command is executed when you run "rocks run roll slurm|sh".
Did you saw an error, when You ran this command?
Best regards
Werner
I've added the slurm roll at the very beginning in the installation process. That is, I added the slurm roll together with base, kernel, os and so on. I thought that the installation steps are then performed automatically just as if I had installed a different roll like torque...
So, no, there wasn't an error message, but maybe I missed it because the installation process took place in the backgound?
Thanks,
Martin
Last edit: Martin Brodbeck 2013-10-30
On 30.10.2013 14:33, Martin Brodbeck wrote:
Sorry. Please run this command:
rocks run roll slurm > /tmp/slurm.script.
You will find the following lines:
service slurmdbd start
sleep 60
sacctmgr -i create cluster $CLUSTER
sleep 20
service slurm start
You see, that I wait 60 seconds after starting slurmdbd, this was always
enough in my tests. But if this time is too short
the command to create the cluster will fail. I think, that this was the
reason for the failure.
I will try to find a better solution.
Thank You for your help.
Best regards
Werner
Thanks, Werner. So, do you guess that this "sacctmgr add cluster myname" was enough I had to to in oder to fix the slurm installation? It seems that everything is working well, though...
On 30.10.2013 16:40, Martin Brodbeck wrote:
when You install the headnode, You have to give the name of the cluster.
The command "rocks run roll slurm|sh" writes the name of the cluster to
the file /etc/slurm/headnode.conf
If the name of your cluster is myname, then:
sacctmgr -i add cluster myname
is enough and all is o.k.
Best regards
Werner