slurm-roll / Discussion / General Discussion: Vanilla Rocks 6.2 / SLURM install test

Mark - 2015-05-19

I tried a vanilla ROCKS 6.2 install with the latest SLURM roll. I get the following in slurmctld.log, and nothing is working. I'm going to make take a wild stab and guess that the same changes to the schema that broke the torque roll also broke slurm. The sql table you are expecting is simply no longer there. Most of the *_attributes tables were removed last April.

fatal: It appears you don't have any association data from your database. The priority/multifactor plugin requires this information to run correctly. Please check your database connection and try again.

https://github.com/rocksclusters/base/commit/ed19a154c09c8ffb481faafded204cb8cefd538b

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mark - 2015-05-19

Hmm, so much for that theory. Apparently you use /usr/bin/mysql, not the rocks mysql. I can get into the database with the info in /etc/slurm/slurmdbd.conf, but I see nothing at all in the log file when the slurmdbd service starts?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James - 2015-05-19

Try running the following to set up DB.
export CLUSTER=$(/opt/rocks/bin/rocks list attr|awk ' /Info_ClusterName:/ { print $2 }')
sacctmgr -i create cluster $CLUSTER

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Werner Saar - 2015-05-19

@mark , is the problem solved now, or do you need help

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mark - 2015-05-19

That seems to work now. Any idea why the cluster name was not set at install time?

When I "qsub -I" in slurm I remain on head. When we ran torque you would find yourself on the "MOM" node. Is this normal behavior or configurable?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Werner Saar - 2015-05-20
  
  Hi,
  
  at the end of the script is a loop, that tries to set the cluster name.
  But if the database is still busy or locked, setting the cluster name fails
  and the script writes a warning message, that you should set the
  cluster name.
  
  Best Regards
  Werner
  
  On 05/19/2015 07:30 PM, Mark wrote:
  
  That seems to work now. Any idea why the cluster name was not set at install time?
  
  When I "qsub -I" in slurm I remain on head. When we ran torque you would find yourself on the "MOM" node. Is this normal behavior or configurable?
  
  Vanilla Rocks 6.2 / SLURM install test
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Mark - 2015-05-20
    
    So unlike torque you perhaps should install the slurm roll after ROCKS has been installed. It seems with SGE or torque you really can't install them after the fact. Both SGE and torque must be installed while ROCKS is installed in my experience.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Werner Saar - 2015-05-20
  
  With slurm, you alwas remain on the node, where you started the batch job.
  There is no MOM node.
  You can install one or more login nodes, to start batch jobs
  
  On 05/19/2015 07:30 PM, Mark wrote:
  
  That seems to work now. Any idea why the cluster name was not set at install time?
  
  When I "qsub -I" in slurm I remain on head. When we ran torque you would find yourself on the "MOM" node. Is this normal behavior or configurable?
  
  Vanilla Rocks 6.2 / SLURM install test
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mark - 2015-05-20

Am I remembering correctly that once a node is allocated to a job none of the other users can ssh to that node? I am remembering this from when I looked at slurm over a year ago. I remember being surprised that slurm affects access control once a job has been started. The question is what happens when two different people have some of the CPU cores running jobs, since our new nodes are 64 cores each. Or am I remembering this wrong?

We plan on having 2 or more "login" nodes and limiting access to the head node.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Werner Saar - 2015-05-21
  
  Hi,
  
  an user can only ssh to a node, where he has resources allocated ( batch
  job or salloc ).
  If 2 people have resources allocated resources on one node, both can ssh
  to this node.
  
  Werner
  
  On 05/20/2015 07:35 PM, Mark wrote:
  
  Am I remembering correctly that once a node is allocated to a job none of the other users can ssh to that node? I am remembering this from when I looked at slurm over a year ago. I remember being surprised that slurm affects access control once a job has been started. The question is what happens when two different people have some of the CPU cores running jobs, since our new nodes are 64 cores each. Or am I remembering this wrong?
  
  We plan on having 2 or more "login" nodes and limiting access to the head node.
  
  Vanilla Rocks 6.2 / SLURM install test
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mark - 2015-07-22

Using the latest slurm roll on ROCKS 6.2 install you still appear to need "sacctmgr -i create cluster $CLUSTER" to get things started working.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vanilla Rocks 6.2 / SLURM install test

Slurm Resource Manager for Rocks Clusters

Forums

Help

Vanilla Rocks 6.2 / SLURM install test

Vanilla Rocks 6.2 / SLURM install test

Slurm Resource Manager for Rocks Clusters

Forums

Help

Vanilla Rocks 6.2 / SLURM install test document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Vanilla Rocks 6.2 / SLURM install test