Menu

Problem slurm-roll with configuration of gpu

Fany
2015-10-23
2016-02-03
1 2 > >> (Page 1 of 2)
  • Fany

    Fany - 2015-10-23

    I have a test cluster with rocks cluster 6.1.1 and slurm-roll 6.1.1-14.03.6. Each node has two gpu (gtx 260). I need send jobs for several nodes, but it´s not work fine.
    I send with:
    srun -n 1 -N 1 --gres=gpu:2 mpirun application -- it´s works, but only I can send with nodo compute0-0, with compute-0-1 and compute-0-2 I get the error :

    srun: error: Unable to allocate resources: Requested node configuration is not available

    I don’t know what I missed because I have the same configuration in all nodes.

    Any idea? Thanks

     
  • Werner Saar

    Werner Saar - 2015-10-24

    Hi,

    I need the output of the commands:

    scontrol show node
    scontrol show partitions

    and the files /etc/slurm/gres.conf

    Best Regards
    Werner

     
  • Werner Saar

    Werner Saar - 2015-10-24

    Hi,

    The file /etc/slurm/gres.conf should have only 2 lines like this:

    Name=gpu Type=nvidia File=/dev/nvidia0 CPUs=0
    Name=gpu Type=nvidia File=/dev/nvidia1 CPUs=1

    The node configuration in /etc/slurm/nodenames.conf should look this:

    NodeName=compute-0-0 NodeAddr=10.1.255.254 CPUs=2 Weight=20481900 Feature=rack-0,2CPUs Gres=gpu

    Best regards
    Werner

     
  • Werner Saar

    Werner Saar - 2015-10-24

    Hi Fany,

    Today I published slurm-roll 15.08.2.
    I have tested, that this version works on rocks-6.1.1 and checked that an update
    from version 14.03.6 succeds.

    I recommand to update to this version, because GPU Computing is integrated and tested.
    I can give you a list of instructions, how to update.

    But you have to reinstall the compute nodes.

    Best regards
    Werner

     
  • Fany

    Fany - 2015-10-26

    Dear Werner,
    Where can I found the version slurm-roll 15.08.2?

    this is my otput:

    [root@cluster bin]# scontrol show node
    NodeName=cluster CoresPerSocket=1
    CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=N/A Features=(null)
    Gres=gpu:2
    NodeAddr=10.8.52.254 NodeHostName=cluster Version=(null)
    RealMemory=1 AllocMem=0 Sockets=1 Boards=1
    State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
    BootTime=None SlurmdStartTime=None
    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
    Reason=Not responding [root@2015-10-23T10:10:25]

    NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1
    CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.00 Features=rack-0,8CPUs
    Gres=gpu:2
    NodeAddr=10.8.52.253 NodeHostName=compute-0-0 Version=14.03
    OS=Linux RealMemory=5968 AllocMem=0 Sockets=8 Boards=1
    State=IDLE ThreadsPerCore=1 TmpDisk=447278 Weight=20488100
    BootTime=2015-10-22T15:05:08 SlurmdStartTime=2015-10-23T09:33:45
    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

    NodeName=compute-0-1 Arch=x86_64 CoresPerSocket=1
    CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.08 Features=rack-0,8CPUs
    Gres=gpu:2
    NodeAddr=10.8.52.252 NodeHostName=compute-0-1 Version=14.03
    OS=Linux RealMemory=5972 AllocMem=0 Sockets=8 Boards=1
    State=IDLE ThreadsPerCore=1 TmpDisk=447278 Weight=20488101
    BootTime=2015-10-22T15:06:09 SlurmdStartTime=2015-10-22T15:06:40
    CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

    This is the file /etc/slurm/slurm.conf

    NodeName=cluster NodeAddr=10.8.52.254 gres=gpu:2
    GresTypes=gpu
    SelectType=select/cons_res

    This is the file /etc/slurm/gres.conf (this file is in each node)

    Name=gpu File=/dev/nvidia0
    Name=gpu File=/dev/nvidia1

    /ets/slum/nodename.conf
    NodeName=DEFAULT State=UNKNOWN

    NodeName=compute-0-0 NodeAddr=10.8.52.253 CPUs=8 Weight=20488100 Feature=rack-0$ gres=gpu:2
    NodeName=compute-0-1 NodeAddr=10.8.52.252 CPUs=8 Weight=20488101 Feature=rack-0$ gres=gpu:2
    NodeName=compute-0-2 NodeAddr=10.8.52.251 CPUs=8 Weight=20488102 Feature=rack-0$ gres=gpu:2

    Best regards
    Fany

     
    • Werner Saar

      Werner Saar - 2015-10-26

      Hi,

      you can download the roll from
      https://sourceforge.net/projects/slurm-roll/files/release-6.2-15.08.2/

      Today I also published a roll for the Nvidia closed source driver.
      You can download this roll for rocks-6.1.1 from
      https://sourceforge.net/projects/slurm-roll/files/addons/6.1.1/rolls/nvidia/

      I hope, that this will help you

      Best regards
      Werner

      On 10/26/2015 02:49 PM, Fany wrote:

      Dear Werner,
      Where can I found the version slurm-roll 15.08.2?

      this is my otput:

      [root@cluster bin]# scontrol show node
      NodeName=cluster CoresPerSocket=1
      CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=N/A Features=(null)
      Gres=gpu:2
      NodeAddr=10.8.52.254 NodeHostName=cluster Version=(null)
      RealMemory=1 AllocMem=0 Sockets=1 Boards=1
      State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1
      BootTime=None SlurmdStartTime=None
      CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
      ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
      Reason=Not responding [root@2015-10-23T10:10:25]

      NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1
      CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.00 Features=rack-0,8CPUs
      Gres=gpu:2
      NodeAddr=10.8.52.253 NodeHostName=compute-0-0 Version=14.03
      OS=Linux RealMemory=5968 AllocMem=0 Sockets=8 Boards=1
      State=IDLE ThreadsPerCore=1 TmpDisk=447278 Weight=20488100
      BootTime=2015-10-22T15:05:08 SlurmdStartTime=2015-10-23T09:33:45
      CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
      ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

      NodeName=compute-0-1 Arch=x86_64 CoresPerSocket=1
      CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.08 Features=rack-0,8CPUs
      Gres=gpu:2
      NodeAddr=10.8.52.252 NodeHostName=compute-0-1 Version=14.03
      OS=Linux RealMemory=5972 AllocMem=0 Sockets=8 Boards=1
      State=IDLE ThreadsPerCore=1 TmpDisk=447278 Weight=20488101
      BootTime=2015-10-22T15:06:09 SlurmdStartTime=2015-10-22T15:06:40
      CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
      ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

      This is the file /etc/slurm/slurm.conf

      NodeName=cluster NodeAddr=10.8.52.254 gres=gpu:2
      GresTypes=gpu
      SelectType=select/cons_res

      This is the file /etc/slurm/gres.conf (this file is in each node)

      Name=gpu File=/dev/nvidia0
      Name=gpu File=/dev/nvidia1

      /ets/slum/nodename.conf
      NodeName=DEFAULT State=UNKNOWN

      NodeName=compute-0-0 NodeAddr=10.8.52.253 CPUs=8 Weight=20488100 Feature=rack-0$ gres=gpu:2
      NodeName=compute-0-1 NodeAddr=10.8.52.252 CPUs=8 Weight=20488101 Feature=rack-0$ gres=gpu:2
      NodeName=compute-0-2 NodeAddr=10.8.52.251 CPUs=8 Weight=20488102 Feature=rack-0$ gres=gpu:2

      Best regards
      Fany


      Problem slurm-roll with configuration of gpu


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
  • Werner Saar

    Werner Saar - 2015-10-26

    Hi,

    Please sent also the output of the command:

    scontrol show partitions

    There is the character "$" in /etc/slurm/nodename.conf.
    I don't kown the reason.

    Best regards
    Werner

     
    • Fany

      Fany - 2015-10-27

      It was a problem to copied the configuration of /etc/slurm/nodename.conf

      /ets/slum/nodename.conf
      NodeName=DEFAULT State=UNKNOWN

      NodeName=compute-0-0 NodeAddr=10.8.52.253 CPUs=8 Weight=20488100 Feature=rack-0,8CPUs gres=gpu:2
      NodeName=compute-0-1 NodeAddr=10.8.52.252 CPUs=8 Weight=20488101 Feature=rack-0,8CPUs gres=gpu:2
      NodeName=compute-0-2 NodeAddr=10.8.52.251 CPUs=8 Weight=20488102 Feature=rack-0,8CPUs gres=gpu:2

      I installed the roll of slurm6.2-15.08.2 and pb-nvidia.
      After that, I can installed the driver NVIDIA-Linux-x86_64-319.49.run, cudatoolkit_3.2.16_linux_64_rhel5.5.run and gpucomputingsdk_3.2.16_linux?

       
  • Werner Saar

    Werner Saar - 2015-10-27

    Hi,

    did you update slurm with this commands:

    export LANG=C
    rocks disable roll slurm
    rocks remove roll slurm
    rocks add roll slurm*.iso
    rocks enable roll slurm
    cd /export/rocks/install
    rocks create distro
    yum clean all
    yum update
    service slurmdbd restart
    service slurm restart

    and finally:

    rocks sync slurm

    Best regards
    Werner

     
  • Fany

    Fany - 2015-10-27

    I did it all over again and nothing. I can't send job cuda for the other nodes.
    when I send srun -n 2 -N 2 --gres=gpu:2 mpirun cuda+mpi I get:

    root@cluster bin]# srun -n 2 -N 2 --gres=gpu:2 mpirun cudampi
    srun: Force Terminated job 408
    srun: error: Unable to allocate resources: Requested node configuration is not available

    but when I send --gres=gpu:0 I get:

    root@cluster bin]# srun -n 2 -N 2 --gres=gpu: mpirun cuda+mpi
    We have 2 processors
    Spawning from compute-0-0.local
    CUDA MPI
    Probing nodes...
    Node Psid CUDA Cards (devID)
    ----------- ----- ---- ----------
    We have 2 processors
    Spawning from compute-0-1.local
    CUDA MPI

    Probing nodes...
    Node Psid CUDA Cards (devID)
    ----------- ----- ---- ----------
    - compute-0-0.local 1 0 NONE

    • compute-0-1.local 1 0 NONE

    mpirun noticed that the job aborted, but has no info as to the process
    that caused that situation.



    mpirun noticed that the job aborted, but has no info as to the process
    that caused that situation.

    Best regards

     
    • Werner Saar

      Werner Saar - 2015-10-27

      OK,

      but step by step

      what is now the content of /etc/slurm/nodenames.conf
      and what is the output of scontrol show nodes.

      I found that, the Nvidia gtx260 is an older card.
      I have to create a Nvidia roll with another driver.
      This roll will be available tomorow in the morning.

      Best regards
      Werner

      On 10/27/2015 05:16 PM, Fany wrote:

      I did it all over again and nothing. I can't send job cuda for the other nodes.
      when I send srun -n 2 -N 2 --gres=gpu:2 mpirun cuda+mpi I get:

      root@cluster bin]# srun -n 2 -N 2 --gres=gpu:2 mpirun cudampi
      srun: Force Terminated job 408
      srun: error: Unable to allocate resources: Requested node configuration is not available

      but when I send --gres=gpu:0 I get:

      root@cluster bin]# srun -n 2 -N 2 --gres=gpu: mpirun cuda+mpi
      We have 2 processors
      Spawning from compute-0-0.local
      CUDA MPI
      Probing nodes...
      Node Psid CUDA Cards (devID)
      ----------- ----- ---- ----------
      We have 2 processors
      Spawning from compute-0-1.local
      CUDA MPI

      Probing nodes...
      Node Psid CUDA Cards (devID)
      ----------- ----- ---- ----------
      - compute-0-0.local 1 0 NONE

      • compute-0-1.local 1 0 NONE

      mpirun noticed that the job aborted, but has no info as to the process
      that caused that situation.



      mpirun noticed that the job aborted, but has no info as to the process
      that caused that situation.

      Best regards


      Problem slurm-roll with configuration of gpu


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
  • Fany

    Fany - 2015-10-27

    After install the nvidia roll?what is the steps that I need to follow? I don't know if the driver 340.93 it is going to works, but I will going to download and probe it to see the results.
    best regards

     
  • Werner Saar

    Werner Saar - 2015-10-28

    Hi,

    Please read the the pb-nvidia.pdf

    Did you run the command the commands:

    export LANG=C
    rocks add roll pb-nvidia*.iso
    rocks enable roll pb-nvidia
    cd /export/rocks/install
    rocks create distro
    yum clean all
    rocks run roll pb-nvidia|sh

    Then set the bootflags for the compute nodes and the attribute nvidia:

    rocks set host bootflags flags="rdblacklist=nouveau vga=791"
    rocks set appliance attr compute nvidia true

    Please read the file slurm-roll.pdf

    Create the file /etc/slurm/gres.conf.1 with this content:

    Name=gpu Type=nvidia File=/dev/nvidia0 CPUs=0
    Name=gpu Type=nvidia File=/dev/nvidia1 CPUs=1

    Insert the line:

    FILES += /etc/slurm/gres.conf.1

    in the file /var/411/Files.mk, if this line does not exist.

    Then execute:

    cd /var/411
    make clean
    make

    Now add attributes for your compute nodes:

    Example:

    rocks set host attr compute-0-0 slurm_gres_template value="gres.conf.1"
    rocks set host attr compute-0-0 slurm_gres value="gpu"

    and run:

    rocks sync slurm

    Now reinstall compute-0-0 with the command:

    ssh compute-0-0 /boot/kickstart/cluster-kickstart

     
  • Fany

    Fany - 2015-10-28

    ok, but I going to do everything again, so I will install my nodes for the begining.Therefore this step (Now reinstall compute-0-0 with the command:ssh compute-0-0 /boot/kickstart/cluster-kickstart) I haven't do it,right? My server has not GPUs, it is important?
    Best regards

     
  • Werner Saar

    Werner Saar - 2015-10-28

    Hi,

    do you also want to reinstall the server?
    This would make some things simpler.
    The server don't need a gpu.

    Best regards
    Werner

     
  • Fany

    Fany - 2015-10-28

    I don't have to install de server again. I have the server in the virtual machine in a ESX , so I have just to revert my MV to the begining and install the nodes.
    best regards

     
  • Werner Saar

    Werner Saar - 2015-10-28

    if you revert the MV, is slurm-roll installed or not installed

     
  • Fany

    Fany - 2015-10-28

    No, I have a snapshot with Rocks cluster without slurm.

     
  • Fany

    Fany - 2015-10-28

    I have too another snasphot wit rocks cluster with slurm-6.2 so I am going to install pb-nvidia and them I am goint to install the nodes.

     
  • Fany

    Fany - 2015-10-28

    I installed the nodes with slurm-6.2 and pb-nvidia. Now in the nodes, what driver I have to install, I disable the driver nouveau this way.

    echo 0 > /sys/class/vtconsole/vtcon1/bind
    rmmod nouveau
    rmmod ttm
    rmmod drm_kms_helper
    rmmod drm

    then I install the NVIDIA-319.49 driver and them cudatoolkit 3.2.16 and gpusdk 3.2.16 .

    I have to do the same or with the roll pn-nvidia I have to do other things???

    What can I do exactly with the roll pb-nvidia?

     
    • Werner Saar

      Werner Saar - 2015-10-29

      Hi,

      the pb-nvidia roll installs the driver a install time,
      so you don't need to do this manually.
      And if you install cuda and and the gpusdk to /share/apps
      , then you have a real unattended installation.

      On 10/28/2015 08:09 PM, Fany wrote:

      I installed the nodes with slurm-6.2 and pb-nvidia. Now in the nodes, what driver I have to install, I disable the driver nouveau this way.

      echo 0 > /sys/class/vtconsole/vtcon1/bind
      rmmod nouveau
      rmmod ttm
      rmmod drm_kms_helper
      rmmod drm

      then I install the NVIDIA-319.49 driver and them cudatoolkit 3.2.16 and gpusdk 3.2.16 .

      I have to do the same or with the roll pn-nvidia I have to do other things???

      What can I do exactly with the roll pb-nvidia?


      Problem slurm-roll with configuration of gpu


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
  • Fany

    Fany - 2015-10-29

    When I execute this command in the nodes, I have the error:

    [root@compute-0-0 ~]# rocks list roll
    NAME VERSION ARCH ENABLED
    sge: 6.1.1 x86_64 yes
    kvm: 6.1.1 x86_64 yes
    bio: 6.1.1 x86_64 yes
    os: 6.1.1 x86_64 yes
    kernel: 6.1.1 x86_64 yes
    fingerprint: 6.1.1 x86_64 yes
    ganglia: 6.1.1 x86_64 yes
    perl: 6.1.1 x86_64 yes
    python: 6.1.1 x86_64 yes
    area51: 6.1.1 x86_64 yes
    web-server: 6.1.1 x86_64 yes
    htcondor: 6.1.1 x86_64 yes
    java: 6.1.1 x86_64 yes
    base: 6.1.1 x86_64 yes
    zfs-linux: 0.6.2 x86_64 yes
    hpc: 6.1.1 x86_64 yes
    slurm: 6.2.0 x86_64 yes
    pb-nvidia: 340.93 x86_64 yes

    [root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
    Traceback (most recent call last):
    File "/opt/rocks/bin/rocks", line 300, in <module>
    command.runWrapper(name, args[i:])
    File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
    self.run(self._params, self._args)
    File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
    self.addBootflags(0, flags)
    File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
    values(%s, "%s")""" % (nodeid, flags))
    File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
    return self.link.execute(command)
    File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
    self.errorhandler(self, exc, value)
    File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
    _mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'")
    [root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
    Traceback (most recent call last):
    File "/opt/rocks/bin/rocks", line 300, in <module>
    command.runWrapper(name, args[i:])
    File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
    self.run(self._params, self._args)
    File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
    self.addBootflags(0, flags)
    File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
    values(%s, "%s")""" % (nodeid, flags))
    File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
    return self.link.execute(command)
    File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
    self.errorhandler(self, exc, value)
    File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
    _mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'")
    [root@compute-0-0 ~]#

    Anyway I each node the driver is not installed, the nvidia-smi it's not work, and the nouveau driver is active.

     
    • Werner Saar

      Werner Saar - 2015-10-29

      Hi,

      you hava to run this command on the head-node.
      Please note, that the roll sge in not compatible to slurm.

      Best regards
      Werner

      On 10/29/2015 03:24 PM, Fany wrote:

      When I execute this command in the nodes, I have the error:

      [root@compute-0-0 ~]# rocks list roll
      NAME VERSION ARCH ENABLED
      sge: 6.1.1 x86_64 yes
      kvm: 6.1.1 x86_64 yes
      bio: 6.1.1 x86_64 yes
      os: 6.1.1 x86_64 yes
      kernel: 6.1.1 x86_64 yes
      fingerprint: 6.1.1 x86_64 yes
      ganglia: 6.1.1 x86_64 yes
      perl: 6.1.1 x86_64 yes
      python: 6.1.1 x86_64 yes
      area51: 6.1.1 x86_64 yes
      web-server: 6.1.1 x86_64 yes
      htcondor: 6.1.1 x86_64 yes
      java: 6.1.1 x86_64 yes
      base: 6.1.1 x86_64 yes
      zfs-linux: 0.6.2 x86_64 yes
      hpc: 6.1.1 x86_64 yes
      slurm: 6.2.0 x86_64 yes
      pb-nvidia: 340.93 x86_64 yes

      [root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
      Traceback (most recent call last):
      File "/opt/rocks/bin/rocks", line 300, in <module>
      command.runWrapper(name, args[i:])
      File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
      self.run(self._params, self._args)
      File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
      self.addBootflags(0, flags)
      File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
      values(%s, "%s")""" % (nodeid, flags))
      File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
      return self.link.execute(command)
      File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
      self.errorhandler(self, exc, value)
      File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
      raise errorclass, errorvalue
      _mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'")
      [root@compute-0-0 ~]# rocks set host bootflags flags='rdblacklist=nouveau vga=791'
      Traceback (most recent call last):
      File "/opt/rocks/bin/rocks", line 300, in <module>
      command.runWrapper(name, args[i:])
      File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 2213, in runWrapper
      self.run(self._params, self._args)
      File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 153, in run
      self.addBootflags(0, flags)
      File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/set/host/bootflags/init.py", line 125, in addBootflags
      values(%s, "%s")""" % (nodeid, flags))
      File "/opt/rocks/lib/python2.6/site-packages/rocks/commands/init.py", line 1256, in execute
      return self.link.execute(command)
      File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/cursors.py", line 174, in execute
      self.errorhandler(self, exc, value)
      File "/opt/rocks/lib/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
      raise errorclass, errorvalue
      _mysql_exceptions.OperationalError: (1142, "INSERT command denied to user ''@'compute-0-0.local' for table 'bootflags'")
      [root@compute-0-0 ~]#

      Anyway I each node the driver is not installed, the nvidia-smi it's not work, and the nouveau driver is active.


      Problem slurm-roll with configuration of gpu


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/slurm-roll/discussion/general/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
  • Fany

    Fany - 2015-10-29

    I dont use sge. anyway I executed this comammd in the server too, and I don't have error, but in the nodes the driver is not installed, so the nvidia-smi is not work and the nouveau driver is active too.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.