Adding Graphics Card Support into Condor Code

Status: Beta

Brought to you by: cstruble, tblattner

Tree [r3] / History

HTTPS access

File	Date	Author	Commit
devtests	2009-04-23	tblattner	[r1] - Added development repository into sourceforge.
documents	2009-05-07	tblattner	[r3] - Added documents used for condor.
samples	2009-04-23	tblattner	[r1] - Added development repository into sourceforge.
script	2009-04-23	tblattner	[r1] - Added development repository into sourceforge.
Changelog	2009-04-23	tblattner	[r1] - Added development repository into sourceforge.
README	2009-04-23	tblattner	[r2] - Added to the TESTING section of the readme fo...
condor_config.local	2009-04-23	tblattner	[r1] - Added development repository into sourceforge.

Read Me

    This README provides documentation on adding Graphics Support to a High
Throughput Computing Environment managed by Condor.

=================================
CONTENTS
=================================

    /condor_config.local       (Sample condor_config.local file)
    /README                    (This README)
    /samples/                  (Sample Condor submission files)
            /gpuQuery.submit
            /gpuQueryLogs/     (Contains log files for gpuQuery.submit)
            /tests/            (Graphics card tests provided by CUDA)
                  /deviceQuery (test obtaining information through CUDA)
                    (params: none)
                  /matrixMul   (test performing matrix multiplication)
                    (params: size of matrix x, size of matrix y)
    /script/                  (Contains the scripts and executable needed to 
            /gpu.sh            (script to get information about Graphics Cards)
            /cudaQuery         (executable to obtain information through CUDA)

=================================
INSTALLATION
=================================

_______LINUX_________

ADDING GRAPHICS CARD DISCOVER TO CONDOR

    1. Test the script, which is located at "script/gpu.sh". In order to obtain
    information about graphics cards, the condor user should have access to the
    command lspci.  For detailed information about NVIDIA CUDA capable graphics
    cards, the condor user should be granted access to reading writing on the 
    graphics card. Files are located for the card in /dev/nvidiactl and 
    /dev/nvidia*

    Sample script output:
    HasGpu = True
    NGpu = 2
    Gpu0 = "Quadro FX 3700"
    Gpu0CudaCapable = True
    Gpu0Mem = 536150016
    Gpu0Procs =  14
    Gpu0Cores = 112
    Gpu1 = "Quadro FX 3700"
    Gpu1CudaCapable = True
    Gpu1Mem = 536608768
    Gpu1Procs =  14
    Gpu1Cores = 112
    HasCuda = True
    CudaRelease = V1.1
    CudaVersion = V0.2.1221
    -

    2. "condor_config.local" contains code to add cronjob into the machine's 
    condor local configuration file.  Copy the cronjob code into the condor 
    local configuration file, which is located by default at:
    /var/lib/condor/condor_config.local
    Place gpu.sh script and cudaQuery binary into location accessible to condor
    user.

    Cronjob code:
    STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST), UPDATEGPUINFO
    STARTD_CRON_UPDATEGPUINFO_EXECUTABLE = /DIRECTORY/TO/SCRIPT/gpu.sh
    STARTD_CRON_UPDATEGPUINFO_PERIOD = 1m
    STARTD_CRON_UPDATEGPUINFO_MODE = Periodic
    STARTD_CRON_UPDATEGPUINFO_KILL = True

    3. Restart condor daemons on local machine with command:
    `/sbin/service condor restart`
    By restarting the daemons, the cronjob will be added and information 
    regarding gpus should be sent in class ad form.

    4. To check that information is in the condor_collector run the command:
    `condor_status -constraint HasGpu`
    This command will display those machines with the requirement HasGpu.
    Note: It may take a few minutes for the machine's class-ad to be sent.

    5. To view the class-ads, type the command:
    `condor_status (MACHINE ADDRESS) -long`


PREPARING CONDOR TO RUN CUDA JOBS

    In order to run CUDA jobs in the Condor environment, submitting/running 
    users must be granted access to read/write the devices.
    The devices that need to be accessed are located in /dev/nvidia*

    These users could be:
        Nobody (open access)
        Controlled by Unix group (limited users)
        Integrated with Condor user control (slot users)


_______MAC OS X______

To be implemented...


_______WINDOWS_______

To be implemented...

=================================
JOB SUBMISSION TESTING
=================================

    To submit jobs into condor, users must be granted the access 
(see PREPARING CONDOR TO RUN CUDA JOBS).

    Tests have been provided in the samples directory.  The "gpuQuery.submit"
file contains requirements that will locate machines that have identified a
machine that has GPU identification successfuly set up. Prior to testing,
ensure the requirements match what might be in the cluster.

    I recommend for your first test running the "deviceQuery" test in order to
see if your Condor setup is properly set up.  If the use submitting truely
has access to the GPU, the name of the graphics card will be successfully
printed.  If the user is not properly granted the acess to the graphics card,
then the name of the card will be that it is emulated on the CPU.


=================================
EXAMPLES
=================================

Adding Graphics Card Support into Condor Code

Tree [r3] / Download Snapshot History

Read Me

Tree [r3] /

History