I have a test cluster with rocks cluster 6.1.1 and slurm-roll 6.1.1-14.03.6. Each node has two gpu (gtx 260). I need send jobs for several nodes, but it´s not work fine.
I send with: srun -n 1 -N 1 --gres=gpu:2 mpirun application -- it´s works, but only I can send with nodo compute0-0, with compute-0-1 and compute-0-2 I get the error :
srun: error: Unable to allocate resources: Requested node configuration is not available
I don’t know what I missed because I have the same configuration in all nodes.
Any idea? Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I installed the roll of slurm6.2-15.08.2 and pb-nvidia.
After that, I can installed the driver NVIDIA-Linux-x86_64-319.49.run, cudatoolkit_3.2.16_linux_64_rhel5.5.run and gpucomputingsdk_3.2.16_linux?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
export LANG=C
rocks disable roll slurm
rocks remove roll slurm
rocks add roll slurm*.iso
rocks enable roll slurm
cd /export/rocks/install
rocks create distro
yum clean all
yum update
service slurmdbd restart
service slurm restart
and finally:
rocks sync slurm
Best regards
Werner
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I did it all over again and nothing. I can't send job cuda for the other nodes.
when I send srun -n 2 -N 2 --gres=gpu:2 mpirun cuda+mpi I get:
root@cluster bin]# srun -n 2 -N 2 --gres=gpu:2 mpirun cudampi
srun: Force Terminated job 408
srun: error: Unable to allocate resources: Requested node configuration is not available
but when I send --gres=gpu:0 I get:
root@cluster bin]# srun -n 2 -N 2 --gres=gpu: mpirun cuda+mpi
We have 2 processors
Spawning from compute-0-0.local
CUDA MPI
Probing nodes...
Node Psid CUDA Cards (devID)
----------- ----- ---- ----------
We have 2 processors
Spawning from compute-0-1.local
CUDA MPI
what is now the content of /etc/slurm/nodenames.conf
and what is the output of scontrol show nodes.
I found that, the Nvidia gtx260 is an older card.
I have to create a Nvidia roll with another driver.
This roll will be available tomorow in the morning.
Best regards
Werner
On 10/27/2015 05:16 PM, Fany wrote:
I did it all over again and nothing. I can't send job cuda for the other nodes.
when I send srun -n 2 -N 2 --gres=gpu:2 mpirun cuda+mpi I get:
root@cluster bin]# srun -n 2 -N 2 --gres=gpu:2 mpirun cudampi
srun: Force Terminated job 408
srun: error: Unable to allocate resources: Requested node configuration is not available
but when I send --gres=gpu:0 I get:
root@cluster bin]# srun -n 2 -N 2 --gres=gpu: mpirun cuda+mpi
We have 2 processors
Spawning from compute-0-0.local
CUDA MPI
Probing nodes...
Node Psid CUDA Cards (devID)
----------- ----- ---- ----------
We have 2 processors
Spawning from compute-0-1.local
CUDA MPI
After install the nvidia roll?what is the steps that I need to follow? I don't know if the driver 340.93 it is going to works, but I will going to download and probe it to see the results.
best regards
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
export LANG=C
rocks add roll pb-nvidia*.iso
rocks enable roll pb-nvidia
cd /export/rocks/install
rocks create distro
yum clean all
rocks run roll pb-nvidia|sh
Then set the bootflags for the compute nodes and the attribute nvidia:
rocks set host bootflags flags="rdblacklist=nouveau vga=791"
rocks set appliance attr compute nvidia true
Please read the file slurm-roll.pdf
Create the file /etc/slurm/gres.conf.1 with this content:
ok, but I going to do everything again, so I will install my nodes for the begining.Therefore this step (Now reinstall compute-0-0 with the command:ssh compute-0-0 /boot/kickstart/cluster-kickstart) I haven't do it,right? My server has not GPUs, it is important?
Best regards
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I don't have to install de server again. I have the server in the virtual machine in a ESX , so I have just to revert my MV to the begining and install the nodes.
best regards
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
the pb-nvidia roll installs the driver a install time,
so you don't need to do this manually.
And if you install cuda and and the gpusdk to /share/apps
, then you have a real unattended installation.
On 10/28/2015 08:09 PM, Fany wrote:
I installed the nodes with slurm-6.2 and pb-nvidia. Now in the nodes, what driver I have to install, I disable the driver nouveau this way.
[root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
Traceback (most recent call last):
File "/opt/rocks/bin/rocks", line 300, in <module>
command.runWrapper(name, args[i:])
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
self.run(self._params, self._args)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
self.addBootflags(0, flags)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
values(%s, "%s")""" % (nodeid, flags))
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
return self.link.execute(command)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'") [root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
Traceback (most recent call last):
File "/opt/rocks/bin/rocks", line 300, in <module>
command.runWrapper(name, args[i:])
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
self.run(self._params, self._args)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
self.addBootflags(0, flags)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
values(%s, "%s")""" % (nodeid, flags))
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
return self.link.execute(command)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'") [root@compute-0-0 ~]#
Anyway I each node the driver is not installed, the nvidia-smi it's not work, and the nouveau driver is active.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
[root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
Traceback (most recent call last):
File "/opt/rocks/bin/rocks", line 300, in <module>
command.runWrapper(name, args[i:])
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
self.run(self._params, self._args)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
self.addBootflags(0, flags)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
values(%s, "%s")""" % (nodeid, flags))
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
return self.link.execute(command)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'") [root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
Traceback (most recent call last):
File "/opt/rocks/bin/rocks", line 300, in <module>
command.runWrapper(name, args[i:])
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
self.run(self._params, self._args)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
self.addBootflags(0, flags)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
values(%s, "%s")""" % (nodeid, flags))
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
return self.link.execute(command)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'") [root@compute-0-0 ~]#
Anyway I each node the driver is not installed, the nvidia-smi it's not work, and the nouveau driver is active.
I dont use sge. anyway I executed this comammd in the server too, and I don't have error, but in the nodes the driver is not installed, so the nvidia-smi is not work and the nouveau driver is active too.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a test cluster with rocks cluster 6.1.1 and slurm-roll 6.1.1-14.03.6. Each node has two gpu (gtx 260). I need send jobs for several nodes, but it´s not work fine.
I send with:
srun -n 1 -N 1 --gres=gpu:2 mpirun application -- it´s works, but only I can send with nodo compute0-0, with compute-0-1 and compute-0-2 I get the error :
srun: error: Unable to allocate resources: Requested node configuration is not available
I don’t know what I missed because I have the same configuration in all nodes.
Any idea? Thanks
Hi,
I need the output of the commands:
scontrol show node
scontrol show partitions
and the files /etc/slurm/gres.conf
Best Regards
Werner
Hi,
The file /etc/slurm/gres.conf should have only 2 lines like this:
Name=gpu Type=nvidia File=/dev/nvidia0 CPUs=0
Name=gpu Type=nvidia File=/dev/nvidia1 CPUs=1
The node configuration in /etc/slurm/nodenames.conf should look this:
NodeName=compute-0-0 NodeAddr=10.1.255.254 CPUs=2 Weight=20481900 Feature=rack-0,2CPUs Gres=gpu
Best regards
Werner
Hi Fany,
Today I published slurm-roll 15.08.2.
I have tested, that this version works on rocks-6.1.1 and checked that an update
from version 14.03.6 succeds.
I recommand to update to this version, because GPU Computing is integrated and tested.
I can give you a list of instructions, how to update.
But you have to reinstall the compute nodes.
Best regards
Werner
Dear Werner,
Where can I found the version slurm-roll 15.08.2?
this is my otput:
[root@cluster bin]# scontrol show node
NodeName=cluster CoresPerSocket=1
CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=N/A Features=(null)
Gres=gpu:2
NodeAddr=10.8.52.254 NodeHostName=cluster Version=(null)
RealMemory=1 AllocMem=0 Sockets=1 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
BootTime=None SlurmdStartTime=None
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Reason=Not responding [root@2015-10-23T10:10:25]
NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.00 Features=rack-0,8CPUs
Gres=gpu:2
NodeAddr=10.8.52.253 NodeHostName=compute-0-0 Version=14.03
OS=Linux RealMemory=5968 AllocMem=0 Sockets=8 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=447278 Weight=20488100
BootTime=2015-10-22T15:05:08 SlurmdStartTime=2015-10-23T09:33:45
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
NodeName=compute-0-1 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.08 Features=rack-0,8CPUs
Gres=gpu:2
NodeAddr=10.8.52.252 NodeHostName=compute-0-1 Version=14.03
OS=Linux RealMemory=5972 AllocMem=0 Sockets=8 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=447278 Weight=20488101
BootTime=2015-10-22T15:06:09 SlurmdStartTime=2015-10-22T15:06:40
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
This is the file /etc/slurm/slurm.conf
NodeName=cluster NodeAddr=10.8.52.254 gres=gpu:2
GresTypes=gpu
SelectType=select/cons_res
This is the file /etc/slurm/gres.conf (this file is in each node)
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
/ets/slum/nodename.conf
NodeName=DEFAULT State=UNKNOWN
NodeName=compute-0-0 NodeAddr=10.8.52.253 CPUs=8 Weight=20488100 Feature=rack-0$ gres=gpu:2
NodeName=compute-0-1 NodeAddr=10.8.52.252 CPUs=8 Weight=20488101 Feature=rack-0$ gres=gpu:2
NodeName=compute-0-2 NodeAddr=10.8.52.251 CPUs=8 Weight=20488102 Feature=rack-0$ gres=gpu:2
Best regards
Fany
Hi,
you can download the roll from
https://sourceforge.net/projects/slurm-roll/files/release-6.2-15.08.2/
Today I also published a roll for the Nvidia closed source driver.
You can download this roll for rocks-6.1.1 from
https://sourceforge.net/projects/slurm-roll/files/addons/6.1.1/rolls/nvidia/
I hope, that this will help you
Best regards
Werner
On 10/26/2015 02:49 PM, Fany wrote:
Hi,
Please sent also the output of the command:
scontrol show partitions
There is the character "$" in /etc/slurm/nodename.conf.
I don't kown the reason.
Best regards
Werner
It was a problem to copied the configuration of /etc/slurm/nodename.conf
/ets/slum/nodename.conf
NodeName=DEFAULT State=UNKNOWN
NodeName=compute-0-0 NodeAddr=10.8.52.253 CPUs=8 Weight=20488100 Feature=rack-0,8CPUs gres=gpu:2
NodeName=compute-0-1 NodeAddr=10.8.52.252 CPUs=8 Weight=20488101 Feature=rack-0,8CPUs gres=gpu:2
NodeName=compute-0-2 NodeAddr=10.8.52.251 CPUs=8 Weight=20488102 Feature=rack-0,8CPUs gres=gpu:2
I installed the roll of slurm6.2-15.08.2 and pb-nvidia.
After that, I can installed the driver NVIDIA-Linux-x86_64-319.49.run, cudatoolkit_3.2.16_linux_64_rhel5.5.run and gpucomputingsdk_3.2.16_linux?
Hi,
did you update slurm with this commands:
export LANG=C
rocks disable roll slurm
rocks remove roll slurm
rocks add roll slurm*.iso
rocks enable roll slurm
cd /export/rocks/install
rocks create distro
yum clean all
yum update
service slurmdbd restart
service slurm restart
and finally:
rocks sync slurm
Best regards
Werner
I did it all over again and nothing. I can't send job cuda for the other nodes.
when I send srun -n 2 -N 2 --gres=gpu:2 mpirun cuda+mpi I get:
root@cluster bin]# srun -n 2 -N 2 --gres=gpu:2 mpirun cudampi
srun: Force Terminated job 408
srun: error: Unable to allocate resources: Requested node configuration is not available
but when I send --gres=gpu:0 I get:
root@cluster bin]# srun -n 2 -N 2 --gres=gpu: mpirun cuda+mpi
We have 2 processors
Spawning from compute-0-0.local
CUDA MPI
Probing nodes...
Node Psid CUDA Cards (devID)
----------- ----- ---- ----------
We have 2 processors
Spawning from compute-0-1.local
CUDA MPI
Probing nodes...
Node Psid CUDA Cards (devID)
----------- ----- ---- ----------
- compute-0-0.local 1 0 NONE
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
Best regards
OK,
but step by step
what is now the content of /etc/slurm/nodenames.conf
and what is the output of scontrol show nodes.
I found that, the Nvidia gtx260 is an older card.
I have to create a Nvidia roll with another driver.
This roll will be available tomorow in the morning.
Best regards
Werner
On 10/27/2015 05:16 PM, Fany wrote:
Hi,
I published a Nvidia roll for Geforce series 2xx.
Please download the roll from:
http://sourceforge.net/projects/slurm-roll/files/addons/6.1.1/rolls/nvidia/pb-nvidia-340.93-0.x86_64.disk1.iso
Best regards
Werner
After install the nvidia roll?what is the steps that I need to follow? I don't know if the driver 340.93 it is going to works, but I will going to download and probe it to see the results.
best regards
Hi,
Please read the the pb-nvidia.pdf
Did you run the command the commands:
export LANG=C
rocks add roll pb-nvidia*.iso
rocks enable roll pb-nvidia
cd /export/rocks/install
rocks create distro
yum clean all
rocks run roll pb-nvidia|sh
Then set the bootflags for the compute nodes and the attribute nvidia:
rocks set host bootflags flags="rdblacklist=nouveau vga=791"
rocks set appliance attr compute nvidia true
Please read the file slurm-roll.pdf
Create the file /etc/slurm/gres.conf.1 with this content:
Name=gpu Type=nvidia File=/dev/nvidia0 CPUs=0
Name=gpu Type=nvidia File=/dev/nvidia1 CPUs=1
Insert the line:
FILES += /etc/slurm/gres.conf.1
in the file /var/411/Files.mk, if this line does not exist.
Then execute:
cd /var/411
make clean
make
Now add attributes for your compute nodes:
Example:
rocks set host attr compute-0-0 slurm_gres_template value="gres.conf.1"
rocks set host attr compute-0-0 slurm_gres value="gpu"
and run:
rocks sync slurm
Now reinstall compute-0-0 with the command:
ssh compute-0-0 /boot/kickstart/cluster-kickstart
ok, but I going to do everything again, so I will install my nodes for the begining.Therefore this step (Now reinstall compute-0-0 with the command:ssh compute-0-0 /boot/kickstart/cluster-kickstart) I haven't do it,right? My server has not GPUs, it is important?
Best regards
Hi,
do you also want to reinstall the server?
This would make some things simpler.
The server don't need a gpu.
Best regards
Werner
I don't have to install de server again. I have the server in the virtual machine in a ESX , so I have just to revert my MV to the begining and install the nodes.
best regards
if you revert the MV, is slurm-roll installed or not installed
No, I have a snapshot with Rocks cluster without slurm.
I have too another snasphot wit rocks cluster with slurm-6.2 so I am going to install pb-nvidia and them I am goint to install the nodes.
I installed the nodes with slurm-6.2 and pb-nvidia. Now in the nodes, what driver I have to install, I disable the driver nouveau this way.
echo 0 > /sys/class/vtconsole/vtcon1/bind
rmmod nouveau
rmmod ttm
rmmod drm_kms_helper
rmmod drm
then I install the NVIDIA-319.49 driver and them cudatoolkit 3.2.16 and gpusdk 3.2.16 .
I have to do the same or with the roll pn-nvidia I have to do other things???
What can I do exactly with the roll pb-nvidia?
Hi,
the pb-nvidia roll installs the driver a install time,
so you don't need to do this manually.
And if you install cuda and and the gpusdk to /share/apps
, then you have a real unattended installation.
On 10/28/2015 08:09 PM, Fany wrote:
When I execute this command in the nodes, I have the error:
[root@compute-0-0 ~]# rocks list roll
NAME VERSION ARCH ENABLED
sge: 6.1.1 x86_64 yes
kvm: 6.1.1 x86_64 yes
bio: 6.1.1 x86_64 yes
os: 6.1.1 x86_64 yes
kernel: 6.1.1 x86_64 yes
fingerprint: 6.1.1 x86_64 yes
ganglia: 6.1.1 x86_64 yes
perl: 6.1.1 x86_64 yes
python: 6.1.1 x86_64 yes
area51: 6.1.1 x86_64 yes
web-server: 6.1.1 x86_64 yes
htcondor: 6.1.1 x86_64 yes
java: 6.1.1 x86_64 yes
base: 6.1.1 x86_64 yes
zfs-linux: 0.6.2 x86_64 yes
hpc: 6.1.1 x86_64 yes
slurm: 6.2.0 x86_64 yes
pb-nvidia: 340.93 x86_64 yes
[root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
Traceback (most recent call last):
File "/opt/rocks/bin/rocks", line 300, in <module>
command.runWrapper(name, args[i:])
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
self.run(self._params, self._args)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
self.addBootflags(0, flags)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
values(%s, "%s")""" % (nodeid, flags))
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
return self.link.execute(command)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'")
[root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
Traceback (most recent call last):
File "/opt/rocks/bin/rocks", line 300, in <module>
command.runWrapper(name, args[i:])
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
self.run(self._params, self._args)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
self.addBootflags(0, flags)
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
values(%s, "%s")""" % (nodeid, flags))
File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
return self.link.execute(command)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'")
[root@compute-0-0 ~]#
Anyway I each node the driver is not installed, the nvidia-smi it's not work, and the nouveau driver is active.
Hi,
you hava to run this command on the head-node.
Please note, that the roll sge in not compatible to slurm.
Best regards
Werner
On 10/29/2015 03:24 PM, Fany wrote:
I dont use sge. anyway I executed this comammd in the server too, and I don't have error, but in the nodes the driver is not installed, so the nvidia-smi is not work and the nouveau driver is active too.