Download Latest Version condorRelease10.gz (251.4 kB)
Email in envelope

Get an email when there's a new version of Adding Graphics Card Support into Condor

Home / OldFiles
Name Modified Size InfoDownloads / Week
Parent folder
condorRelease10.tar 2009-04-08 829.4 kB
matrixMul 2009-04-08 119.5 kB
deviceQuery 2009-04-08 114.5 kB
gpuQuery.submit 2009-04-08 675 Bytes
condor_config.local 2009-04-08 1.1 kB
gpu.sh 2009-04-08 2.9 kB
cudaQuery 2009-04-08 80.9 kB
README 2009-04-08 2.9 kB
Totals: 8 Items   1.2 MB 0
    This provides documentation on adding Graphics Support to a High
Throughput Computing Environment managed by Condor.

CONTENTS
=================================

    /condor_config.local       (Sample condor_config.local file)
    /README                    (This README)
    /samples/                  (Sample Condor submission files)
            /gpuQuery.submit
            /gpuQueryLogs/     (Contains log files for gpuQuery.submit)
            /tests/            (Graphics card tests provided by CUDA)
                  /deviceQuery (test obtaining information through CUDA)
                    (params: none)
                  /matrixMul   (test performing matrix multiplication)
                    (params: size of matrix x, size of matrix y)
    /script/                  (Contains the scripts and executable needed to 
            /gpu.sh            (script to get information about Graphics Cards)
            /cudaQuery         (executable to obtain information through CUDA)

INSTALLATION
=================================

_______LINUX_________

Adding graphics card discovery to Condor:

1. Test the script, which is located in script/gpu.sh. In order to obtain
information about graphics cards, the condor user should have access to the
command lspci.  For detailed information about NVIDIA CUDA capable graphics
cards, the condor user should be granted access to writing on the graphics
card.
Sample script output:
HasGpu = True
NGpu = 2
Gpu0 = "Quadro FX 3700"
Gpu0CudaCapable = True
Gpu0Mem = 536150016
Gpu0Procs =  14
Gpu0Cores = 112
Gpu1 = "Quadro FX 3700"
Gpu1CudaCapable = True
Gpu1Mem = 536608768
Gpu1Procs =  14
Gpu1Cores = 112
HasCuda = True
CudaRelease = V1.1
CudaVersion = V0.2.1221
-

2. "condor_config.local" contains code to add cronjob into the machine's condor
local configuration file.  Copy the cronjob code into the condor local 
configuration file, which is located by default at:
/var/lib/condor/condor_config.local
Cronjob code:
STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST), UPDATEGPUINFO
STARTD_CRON_UPDATEGPUINFO_EXECUTABLE = /DIRECTORY/TO/SCRIPT/gpu.sh
STARTD_CRON_UPDATEGPUINFO_PERIOD = 1m
STARTD_CRON_UPDATEGPUINFO_MODE = Periodic
STARTD_CRON_UPDATEGPUINFO_KILL = True

3. Restart condor daemons on local machine with command:
`/sbin/service condor restart`
By restarting the daemons, the cronjob will be added and information regarding
gpus should be sent in class ad form.

4. To check that information is in the condor_collector run the command:
`condor_status -constraint HasGpu`
This command will display those machines with the requirement HasGpu.
Note: It may take a few minutes for the machine's class-ad to be sent.

5. To view the class-ads, type the command:
`condor_status (MACHINE ADDRESS) -long`


Preparing Condor to run CUDA jobs:




TESTING
=================================


EXAMPLES
=================================


Source: README, updated 2009-04-08