slurm-roll / Discussion / General Discussion: Problem slurm-roll with configuration of gpu

Fany - 2015-10-23

I have a test cluster with rocks cluster 6.1.1 and slurm-roll 6.1.1-14.03.6. Each node has two gpu (gtx 260). I need send jobs for several nodes, but it´s not work fine.
I send with:
srun -n 1 -N 1 --gres=gpu:2 mpirun application -- it´s works, but only I can send with nodo compute0-0, with compute-0-1 and compute-0-2 I get the error :

srun: error: Unable to allocate resources: Requested node configuration is not available

I don’t know what I missed because I have the same configuration in all nodes.

Any idea? Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Werner Saar - 2015-10-24

Hi,

I need the output of the commands:

scontrol show node
scontrol show partitions

and the files /etc/slurm/gres.conf

Best Regards
Werner

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Werner Saar - 2015-10-24

Hi,

The file /etc/slurm/gres.conf should have only 2 lines like this:

Name=gpu Type=nvidia File=/dev/nvidia0 CPUs=0
Name=gpu Type=nvidia File=/dev/nvidia1 CPUs=1

The node configuration in /etc/slurm/nodenames.conf should look this:

NodeName=compute-0-0 NodeAddr=10.1.255.254 CPUs=2 Weight=20481900 Feature=rack-0,2CPUs Gres=gpu

Best regards
Werner

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Werner Saar - 2015-10-24

Hi Fany,

Today I published slurm-roll 15.08.2.
I have tested, that this version works on rocks-6.1.1 and checked that an update
from version 14.03.6 succeds.

I recommand to update to this version, because GPU Computing is integrated and tested.
I can give you a list of instructions, how to update.

But you have to reinstall the compute nodes.

Best regards
Werner

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fany - 2015-10-26

Dear Werner,
Where can I found the version slurm-roll 15.08.2?

this is my otput:

[root@cluster bin]# scontrol show node
NodeName=cluster CoresPerSocket=1
CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=N/A Features=(null)
Gres=gpu:2
NodeAddr=10.8.52.254 NodeHostName=cluster Version=(null)
RealMemory=1 AllocMem=0 Sockets=1 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
BootTime=None SlurmdStartTime=None
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Reason=Not responding [root@2015-10-23T10:10:25]

NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.00 Features=rack-0,8CPUs
Gres=gpu:2
NodeAddr=10.8.52.253 NodeHostName=compute-0-0 Version=14.03
OS=Linux RealMemory=5968 AllocMem=0 Sockets=8 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=447278 Weight=20488100
BootTime=2015-10-22T15:05:08 SlurmdStartTime=2015-10-23T09:33:45
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

NodeName=compute-0-1 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.08 Features=rack-0,8CPUs
Gres=gpu:2
NodeAddr=10.8.52.252 NodeHostName=compute-0-1 Version=14.03
OS=Linux RealMemory=5972 AllocMem=0 Sockets=8 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=447278 Weight=20488101
BootTime=2015-10-22T15:06:09 SlurmdStartTime=2015-10-22T15:06:40
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

This is the file /etc/slurm/slurm.conf

NodeName=cluster NodeAddr=10.8.52.254 gres=gpu:2
GresTypes=gpu
SelectType=select/cons_res

This is the file /etc/slurm/gres.conf (this file is in each node)

Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1

/ets/slum/nodename.conf
NodeName=DEFAULT State=UNKNOWN

NodeName=compute-0-0 NodeAddr=10.8.52.253 CPUs=8 Weight=20488100 Feature=rack-0$ gres=gpu:2
NodeName=compute-0-1 NodeAddr=10.8.52.252 CPUs=8 Weight=20488101 Feature=rack-0$ gres=gpu:2
NodeName=compute-0-2 NodeAddr=10.8.52.251 CPUs=8 Weight=20488102 Feature=rack-0$ gres=gpu:2

Best regards
Fany

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Werner Saar - 2015-10-26
  
  Hi,
  
  you can download the roll from
  https://sourceforge.net/projects/slurm-roll/files/release-6.2-15.08.2/
  
  Today I also published a roll for the Nvidia closed source driver.
  You can download this roll for rocks-6.1.1 from
  https://sourceforge.net/projects/slurm-roll/files/addons/6.1.1/rolls/nvidia/
  
  I hope, that this will help you
  
  Best regards
  Werner
  
  On 10/26/2015 02:49 PM, Fany wrote:
  
  Dear Werner,
  Where can I found the version slurm-roll 15.08.2?
  
  this is my otput:
  
  [root@cluster bin]# scontrol show node
  NodeName=cluster CoresPerSocket=1
  CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=N/A Features=(null)
  Gres=gpu:2
  NodeAddr=10.8.52.254 NodeHostName=cluster Version=(null)
  RealMemory=1 AllocMem=0 Sockets=1 Boards=1
  State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
  BootTime=None SlurmdStartTime=None
  CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
  ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
  Reason=Not responding [root@2015-10-23T10:10:25]
  
  NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1
  CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.00 Features=rack-0,8CPUs
  Gres=gpu:2
  NodeAddr=10.8.52.253 NodeHostName=compute-0-0 Version=14.03
  OS=Linux RealMemory=5968 AllocMem=0 Sockets=8 Boards=1
  State=IDLE ThreadsPerCore=1 TmpDisk=447278 Weight=20488100
  BootTime=2015-10-22T15:05:08 SlurmdStartTime=2015-10-23T09:33:45
  CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
  ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
  
  NodeName=compute-0-1 Arch=x86_64 CoresPerSocket=1
  CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.08 Features=rack-0,8CPUs
  Gres=gpu:2
  NodeAddr=10.8.52.252 NodeHostName=compute-0-1 Version=14.03
  OS=Linux RealMemory=5972 AllocMem=0 Sockets=8 Boards=1
  State=IDLE ThreadsPerCore=1 TmpDisk=447278 Weight=20488101
  BootTime=2015-10-22T15:06:09 SlurmdStartTime=2015-10-22T15:06:40
  CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
  ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
  
  This is the file /etc/slurm/slurm.conf
  
  NodeName=cluster NodeAddr=10.8.52.254 gres=gpu:2
  GresTypes=gpu
  SelectType=select/cons_res
  
  This is the file /etc/slurm/gres.conf (this file is in each node)
  
  Name=gpu File=/dev/nvidia0
  Name=gpu File=/dev/nvidia1
  
  /ets/slum/nodename.conf
  NodeName=DEFAULT State=UNKNOWN
  
  NodeName=compute-0-0 NodeAddr=10.8.52.253 CPUs=8 Weight=20488100 Feature=rack-0$ gres=gpu:2
  NodeName=compute-0-1 NodeAddr=10.8.52.252 CPUs=8 Weight=20488101 Feature=rack-0$ gres=gpu:2
  NodeName=compute-0-2 NodeAddr=10.8.52.251 CPUs=8 Weight=20488102 Feature=rack-0$ gres=gpu:2
  
  Best regards
  Fany
  
  Problem slurm-roll with configuration of gpu
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Werner Saar - 2015-10-26

Hi,

Please sent also the output of the command:

scontrol show partitions

There is the character "$" in /etc/slurm/nodename.conf.
I don't kown the reason.

Best regards
Werner

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Fany - 2015-10-27
  
  It was a problem to copied the configuration of /etc/slurm/nodename.conf
  
  /ets/slum/nodename.conf
  NodeName=DEFAULT State=UNKNOWN
  
  NodeName=compute-0-0 NodeAddr=10.8.52.253 CPUs=8 Weight=20488100 Feature=rack-0,8CPUs gres=gpu:2
  NodeName=compute-0-1 NodeAddr=10.8.52.252 CPUs=8 Weight=20488101 Feature=rack-0,8CPUs gres=gpu:2
  NodeName=compute-0-2 NodeAddr=10.8.52.251 CPUs=8 Weight=20488102 Feature=rack-0,8CPUs gres=gpu:2
  
  I installed the roll of slurm6.2-15.08.2 and pb-nvidia.
  After that, I can installed the driver NVIDIA-Linux-x86_64-319.49.run, cudatoolkit_3.2.16_linux_64_rhel5.5.run and gpucomputingsdk_3.2.16_linux?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Werner Saar - 2015-10-27

Hi,

did you update slurm with this commands:

export LANG=C
rocks disable roll slurm
rocks remove roll slurm
rocks add roll slurm*.iso
rocks enable roll slurm
cd /export/rocks/install
rocks create distro
yum clean all
yum update
service slurmdbd restart
service slurm restart

and finally:

rocks sync slurm

Best regards
Werner

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fany - 2015-10-27

I did it all over again and nothing. I can't send job cuda for the other nodes.
when I send srun -n 2 -N 2 --gres=gpu:2 mpirun cuda+mpi I get:

root@cluster bin]# srun -n 2 -N 2 --gres=gpu:2 mpirun cudampi
srun: Force Terminated job 408
srun: error: Unable to allocate resources: Requested node configuration is not available

but when I send --gres=gpu:0 I get:

root@cluster bin]# srun -n 2 -N 2 --gres=gpu: mpirun cuda+mpi
We have 2 processors
Spawning from compute-0-0.local
CUDA MPI
Probing nodes...
Node Psid CUDA Cards (devID)
----------- ----- ---- ----------
We have 2 processors
Spawning from compute-0-1.local
CUDA MPI

Probing nodes...
Node Psid CUDA Cards (devID)
----------- ----- ---- ----------
- compute-0-0.local 1 0 NONE

compute-0-1.local 1 0 NONE

mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.

mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.

Best regards
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Werner Saar - 2015-10-27
  
  OK,
  
  but step by step
  
  what is now the content of /etc/slurm/nodenames.conf
  and what is the output of scontrol show nodes.
  
  I found that, the Nvidia gtx260 is an older card.
  I have to create a Nvidia roll with another driver.
  This roll will be available tomorow in the morning.
  
  Best regards
  Werner
  
  On 10/27/2015 05:16 PM, Fany wrote:
  
  I did it all over again and nothing. I can't send job cuda for the other nodes.
  when I send srun -n 2 -N 2 --gres=gpu:2 mpirun cuda+mpi I get:
  
  root@cluster bin]# srun -n 2 -N 2 --gres=gpu:2 mpirun cudampi
  srun: Force Terminated job 408
  srun: error: Unable to allocate resources: Requested node configuration is not available
  
  but when I send --gres=gpu:0 I get:
  
  root@cluster bin]# srun -n 2 -N 2 --gres=gpu: mpirun cuda+mpi
  We have 2 processors
  Spawning from compute-0-0.local
  CUDA MPI
  Probing nodes...
  Node Psid CUDA Cards (devID)
  ----------- ----- ---- ----------
  We have 2 processors
  Spawning from compute-0-1.local
  CUDA MPI
  
  Probing nodes...
  Node Psid CUDA Cards (devID)
  ----------- ----- ---- ----------
  - compute-0-0.local 1 0 NONE
  
  compute-0-1.local 1 0 NONE
  
  mpirun noticed that the job aborted, but has no info as to the process
  that caused that situation.
  
  mpirun noticed that the job aborted, but has no info as to the process
  that caused that situation.
  
  Best regards
  
  Problem slurm-roll with configuration of gpu
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Werner Saar - 2015-10-27

Hi,

I published a Nvidia roll for Geforce series 2xx.

Please download the roll from:

http://sourceforge.net/projects/slurm-roll/files/addons/6.1.1/rolls/nvidia/pb-nvidia-340.93-0.x86_64.disk1.iso

Best regards
Werner

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fany - 2015-10-27

After install the nvidia roll?what is the steps that I need to follow? I don't know if the driver 340.93 it is going to works, but I will going to download and probe it to see the results.
best regards

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Werner Saar - 2015-10-28

Hi,

Please read the the pb-nvidia.pdf

Did you run the command the commands:

export LANG=C
rocks add roll pb-nvidia*.iso
rocks enable roll pb-nvidia
cd /export/rocks/install
rocks create distro
yum clean all
rocks run roll pb-nvidia|sh

Then set the bootflags for the compute nodes and the attribute nvidia:

rocks set host bootflags flags="rdblacklist=nouveau vga=791"
rocks set appliance attr compute nvidia true

Please read the file slurm-roll.pdf

Create the file /etc/slurm/gres.conf.1 with this content:

Name=gpu Type=nvidia File=/dev/nvidia0 CPUs=0
Name=gpu Type=nvidia File=/dev/nvidia1 CPUs=1

Insert the line:

FILES += /etc/slurm/gres.conf.1

in the file /var/411/Files.mk, if this line does not exist.

Then execute:

cd /var/411
make clean
make

Now add attributes for your compute nodes:

Example:

rocks set host attr compute-0-0 slurm_gres_template value="gres.conf.1"
rocks set host attr compute-0-0 slurm_gres value="gpu"

and run:

rocks sync slurm

Now reinstall compute-0-0 with the command:

ssh compute-0-0 /boot/kickstart/cluster-kickstart

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fany - 2015-10-28

ok, but I going to do everything again, so I will install my nodes for the begining.Therefore this step (Now reinstall compute-0-0 with the command:ssh compute-0-0 /boot/kickstart/cluster-kickstart) I haven't do it,right? My server has not GPUs, it is important?
Best regards

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Werner Saar - 2015-10-28

Hi,

do you also want to reinstall the server?
This would make some things simpler.
The server don't need a gpu.

Best regards
Werner

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fany - 2015-10-28

I don't have to install de server again. I have the server in the virtual machine in a ESX , so I have just to revert my MV to the begining and install the nodes.
best regards

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Werner Saar - 2015-10-28

if you revert the MV, is slurm-roll installed or not installed

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fany - 2015-10-28

No, I have a snapshot with Rocks cluster without slurm.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fany - 2015-10-28

I have too another snasphot wit rocks cluster with slurm-6.2 so I am going to install pb-nvidia and them I am goint to install the nodes.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fany - 2015-10-28

I installed the nodes with slurm-6.2 and pb-nvidia. Now in the nodes, what driver I have to install, I disable the driver nouveau this way.

echo 0 > /sys/class/vtconsole/vtcon1/bind
rmmod nouveau
rmmod ttm
rmmod drm_kms_helper
rmmod drm

then I install the NVIDIA-319.49 driver and them cudatoolkit 3.2.16 and gpusdk 3.2.16 .

I have to do the same or with the roll pn-nvidia I have to do other things???

What can I do exactly with the roll pb-nvidia?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Werner Saar - 2015-10-29
  
  Hi,
  
  the pb-nvidia roll installs the driver a install time,
  so you don't need to do this manually.
  And if you install cuda and and the gpusdk to /share/apps
  , then you have a real unattended installation.
  
  On 10/28/2015 08:09 PM, Fany wrote:
  
  I installed the nodes with slurm-6.2 and pb-nvidia. Now in the nodes, what driver I have to install, I disable the driver nouveau this way.
  
  echo 0 > /sys/class/vtconsole/vtcon1/bind
  rmmod nouveau
  rmmod ttm
  rmmod drm_kms_helper
  rmmod drm
  
  then I install the NVIDIA-319.49 driver and them cudatoolkit 3.2.16 and gpusdk 3.2.16 .
  
  I have to do the same or with the roll pn-nvidia I have to do other things???
  
  What can I do exactly with the roll pb-nvidia?
  
  Problem slurm-roll with configuration of gpu
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fany - 2015-10-29

When I execute this command in the nodes, I have the error:

[root@compute-0-0 ~]# rocks list roll
NAME VERSION ARCH ENABLED
sge: 6.1.1 x86_64 yes
kvm: 6.1.1 x86_64 yes
bio: 6.1.1 x86_64 yes
os: 6.1.1 x86_64 yes
kernel: 6.1.1 x86_64 yes
fingerprint: 6.1.1 x86_64 yes
ganglia: 6.1.1 x86_64 yes
perl: 6.1.1 x86_64 yes
python: 6.1.1 x86_64 yes
area51: 6.1.1 x86_64 yes
web-server: 6.1.1 x86_64 yes
htcondor: 6.1.1 x86_64 yes
java: 6.1.1 x86_64 yes
base: 6.1.1 x86_64 yes
zfs-linux: 0.6.2 x86_64 yes
hpc: 6.1.1 x86_64 yes
slurm: 6.2.0 x86_64 yes
pb-nvidia: 340.93 x86_64 yes

[root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
Traceback (most recent call last):
File "/opt/rocks/bin/rocks", line 300, in <module>
command.runWrapper(name, args[i:])
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
self.run(self._params, self._args)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
self.addBootflags(0, flags)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
values(%s, "%s")""" % (nodeid, flags))
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
return self.link.execute(command)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'")
[root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
Traceback (most recent call last):
File "/opt/rocks/bin/rocks", line 300, in <module>
command.runWrapper(name, args[i:])
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
self.run(self._params, self._args)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
self.addBootflags(0, flags)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
values(%s, "%s")""" % (nodeid, flags))
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
return self.link.execute(command)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'")
[root@compute-0-0 ~]#

Anyway I each node the driver is not installed, the nvidia-smi it's not work, and the nouveau driver is active.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Werner Saar - 2015-10-29
  
  Hi,
  
  you hava to run this command on the head-node.
  Please note, that the roll sge in not compatible to slurm.
  
  Best regards
  Werner
  
  On 10/29/2015 03:24 PM, Fany wrote:
  
  When I execute this command in the nodes, I have the error:
  
  [root@compute-0-0 ~]# rocks list roll
  NAME VERSION ARCH ENABLED
  sge: 6.1.1 x86_64 yes
  kvm: 6.1.1 x86_64 yes
  bio: 6.1.1 x86_64 yes
  os: 6.1.1 x86_64 yes
  kernel: 6.1.1 x86_64 yes
  fingerprint: 6.1.1 x86_64 yes
  ganglia: 6.1.1 x86_64 yes
  perl: 6.1.1 x86_64 yes
  python: 6.1.1 x86_64 yes
  area51: 6.1.1 x86_64 yes
  web-server: 6.1.1 x86_64 yes
  htcondor: 6.1.1 x86_64 yes
  java: 6.1.1 x86_64 yes
  base: 6.1.1 x86_64 yes
  zfs-linux: 0.6.2 x86_64 yes
  hpc: 6.1.1 x86_64 yes
  slurm: 6.2.0 x86_64 yes
  pb-nvidia: 340.93 x86_64 yes
  
  [root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
  Traceback (most recent call last):
  File "/opt/rocks/bin/rocks", line 300, in <module>
  command.runWrapper(name, args[i:])
  File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
  self.run(self._params, self._args)
  File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
  self.addBootflags(0, flags)
  File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
  values(%s, "%s")""" % (nodeid, flags))
  File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
  return self.link.execute(command)
  File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
  self.errorhandler(self, exc, value)
  File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
  raise errorclass, errorvalue
  _mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'")
  [root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
  Traceback (most recent call last):
  File "/opt/rocks/bin/rocks", line 300, in <module>
  command.runWrapper(name, args[i:])
  File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
  self.run(self._params, self._args)
  File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
  self.addBootflags(0, flags)
  File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
  values(%s, "%s")""" % (nodeid, flags))
  File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
  return self.link.execute(command)
  File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
  self.errorhandler(self, exc, value)
  File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
  raise errorclass, errorvalue
  _mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'")
  [root@compute-0-0 ~]#
  
  Anyway I each node the driver is not installed, the nvidia-smi it's not work, and the nouveau driver is active.
  
  Problem slurm-roll with configuration of gpu
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Fany - 2015-10-29

I dont use sge. anyway I executed this comammd in the server too, and I don't have error, but in the nodes the driver is not installed, so the nvidia-smi is not work and the nouveau driver is active too.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Problem slurm-roll with configuration of gpu

Slurm Resource Manager for Rocks Clusters

Forums

Help

Problem slurm-roll with configuration of gpu

Problem slurm-roll with configuration of gpu

Slurm Resource Manager for Rocks Clusters

Forums

Help

Problem slurm-roll with configuration of gpu document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Problem slurm-roll with configuration of gpu