Download Latest Version patched_files_ge2011.11.p1.0.2.tar.gz (98.2 kB)
Email in envelope

Get an email when there's a new version of SGE_GPU

Home
Name Modified Size InfoDownloads / Week
README 2024-05-06 2.3 kB
UPDATE 2024-04-10 393 Bytes
patched_files_ge2011.11.p1.0.2.tar.gz 2024-04-10 98.2 kB
patched_files_ge2011.11.p1.0.1.tar.gz 2018-10-15 97.6 kB
whole_patches_gpu_ge_update.tar.gz 2018-04-25 125.3 kB
Totals: 5 Items   323.8 kB 0
04/10/2024

For patched Son of GridEgine 8.1.9,

Check github  /prod-feng/songe


Improve support fractioned -l ngpus=0.5. Useful for multithreading GPU jobs which needs multiple CPU cores, but 1 or several GPUs.

-pe openmp 10  # requests 10 cpu cores
-l ngpus=0.2     # so here 10X0.2=2 GPUs for this job on the same node.

Added protection for MT for multiple worker threads. Or set #SGE_ROOT/default/common/bootstrap to be:
  
listener_threads        1
worker_threads         1

Check github  /prod-feng/sge-gpu/tree/master

NO Guarantee!
=====================

10/13/2018

Add patched_files_ge2011.11.p1.0.1.tar.gz .
Fix a bug to support GPU array jobs properly. Only for GE2011.p1 now.

NO Guarantee!

======================



04/24/2018, bugfix:

whole_patches_gpu_ge_update.tar.gz

NO Guarantee!

======================

The file whole_patches_gpu_ge.tar.gz contains the 2 patched versions for GE2011 and SonGE 8.18.

Patch to Son of Grid Engine is available now.


Patch to SGE 2011.p1, Grid Engine, to enable multiple GPU scheduling.

It schedulea GPUs to jobs, or processes of MPI jobs(in file "environment" on work nodes). 

Recompile the source needed. Also, you need to set a consumable, named "ngpus", which is hard coded in the patched files. And assign value of it to each node.

When submit GPU job, run:

>qsub -l ngpus=1 ...

This also works for parallel jobs.

>qsub -pe openmpi 4 -l ngpus=1 ...

Here, "-l ngpus=1" request 1 GPU for 1 process.

It supports multiple GPU scheduling on one node as well. For example, if node001 has 4 GPUs installed. JobA uses GPU0, JobB uses GPU2, and then JobC requestes 2 GPUs, the patched SGE can dispatch GPU1 and GPU3 to JobC, and set the environment for the job on node001:

CUDA_VISIBLE_DEVICES=1,3

For non-GPU jobs, CUDA_VISIBLE_DEVICES is set to empty.

With this patch, you do not need any wrapper tools and loadsensor script anymore.

Download the tar file and expand it to get the patched source files.

In the tar file, there is a script named "apply_patch.sh". You can run it to copy the patched files to the dedicated folder to replace the original ones. Then recompile the whole package.

Developed on CentOS 6.2, Kernel 2.6.32-220.2.1.el6.x86_64, GCC 4.4.6

This patch is only tested partially on a small simulating environment. NO Guarantee!


Source: README, updated 2024-05-06