RunCA and Sun Grid Engine
From wgs-assembler
Sun Grid Engine (SGE) is an industry-standard interface to grid computing facilities. runCA must be configured for your particular implementation of SGE. The configuration depends mostly on your SGE configuration and the hardware available, but also depends on the types of reads you are assembling (Sanger vs 454 vs Illumina).
This configuration requires you to have a good understanding of your grid, and of the Celera Assembler pipeline. The configuration amounts to configuring CA to run jobs of the appropriate size for your hardware. For example, if your grid contains 8-CPU machines each with 64GB of memory, it would make little sense to configure overlapper jobs to use 1-CPU and 1GB of memory.
The following spec file has worked very well on 4- and 8-CPU machines with 2GB of memory available per CPU. It was oroginally tuned for Sanger reads, but works well on 454 reads. It will not work well with Illumina reads.
# By default, do NOT use SGE. The use must manually override these with command-line options. # useGrid = 0 scriptOnGrid = 0 # Once SGE is enabled, also do fragment and overlap correction under SGE control. By default, # overlap and consensus jobs are under SGE control. # frgCorrOnGrid = 1 ovlCorrOnGrid = 1 # The primary SGE configuration. # # All SGE jobs are run with the "-A assembly" option, which annotates the SGE accounting information # for these jobs with the string "assembly". # # Our SGE configuration lets us run multithreaded jobs using multiple CPUs on a single host with # the "-pe thread N" option. We also tell SGE the amount of memory needed. Note that SGE multiplies # the number of threads by the amount of memory to compute the total amount of memory for this job -- # the overlap jobs are requesting 2 threads, each needing 2 GB memory, so the process itself will get # 4GB memory. # sge = -A assembly sgeScript = -pe thread 1 -l memory=8g -p 400 -q fastdisk.q sgeOverlap = -pe thread 2 -l memory=2g -p -600 sgeMerOverlapSeed = -pe thread 2 -l memory=2g -p -600 sgeMerOverlapExtend = -pe thread 2 -l memory=2g -p -500 sgeConsensus = -pe thread 1 -l memory=2g -p -600 sgeFragmentCorrection = -pe thread 2 -l memory=3g -p -500 sgeOverlapCorrection = -pe thread 1 -l memory=3g -p -400 # MERYL configuration # merylMemory = -segments 4 -threads 4 merylThreads = 4 # OVERLAPPER configuration # # Above, when configuring SGE, we requested 4GB or memory for each overlap job. Here, we configure # the overlap job itself to use 4GB of memory. # ovlMemory = 4GB --hashload 0.8 --hashstrings 100000 ovlThreads = 2 ovlHashBlockSize = 180000 ovlRefBlockSize = 2000000 # MER OVERLAPPER configuration # merOverlapperThreads = 2 merOverlapperSeedBatchSize = 90000 merOverlapperExtendBatchSize = 90000 # ERROR CORRECTION configuration # frgCorrBatchSize = 200000 frgCorrThreads = 2 ovlCorrBatchSize = 800000
