1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

RunCA and Sun Grid Engine

From wgs-assembler

Jump to: navigation, search

Sun Grid Engine (SGE) is an industry-standard interface to grid computing facilities. runCA must be configured for your particular implementation of SGE. The configuration depends mostly on your SGE configuration and the hardware available, but also depends on the types of reads you are assembling (Sanger vs 454 vs Illumina).

This configuration requires you to have a good understanding of your grid, and of the Celera Assembler pipeline. The configuration amounts to configuring CA to run jobs of the appropriate size for your hardware. For example, if your grid contains 8-CPU machines each with 64GB of memory, it would make little sense to configure overlapper jobs to use 1-CPU and 1GB of memory.

The following spec file has worked very well on 4- and 8-CPU machines with 2GB of memory available per CPU. It was oroginally tuned for Sanger reads, but works well on 454 reads. It will not work well with Illumina reads.

#  By default, do NOT use SGE.  The use must manually override these with command-line options.
#
useGrid       = 0
scriptOnGrid  = 0

#  Once SGE is enabled, also do fragment and overlap correction under SGE control.  By default,
#  overlap and consensus jobs are under SGE control.
#
frgCorrOnGrid = 1
ovlCorrOnGrid = 1

#  The primary SGE configuration.
#
#  All SGE jobs are run with the "-A assembly" option, which annotates the SGE accounting information
#  for these jobs with the string "assembly".
#
#  Our SGE configuration lets us run multithreaded jobs using multiple CPUs on a single host with
#  the "-pe thread N" option.  We also tell SGE the amount of memory needed.  Note that SGE multiplies
#  the number of threads by the amount of memory to compute the total amount of memory for this job --
#  the overlap jobs are requesting 2 threads, each needing 2 GB memory, so the process itself will get
#  4GB memory.
#
sge                   = -A assembly
sgeScript             = -pe thread 1 -l memory=8g -p  400 -q fastdisk.q
sgeOverlap            = -pe thread 2 -l memory=2g -p -600
sgeMerOverlapSeed     = -pe thread 2 -l memory=2g -p -600
sgeMerOverlapExtend   = -pe thread 2 -l memory=2g -p -500
sgeConsensus          = -pe thread 1 -l memory=2g -p -600
sgeFragmentCorrection = -pe thread 2 -l memory=3g -p -500
sgeOverlapCorrection  = -pe thread 1 -l memory=3g -p -400

#  MERYL configuration
#
merylMemory   = -segments 4 -threads 4
merylThreads  = 4

#  OVERLAPPER configuration
#
#  Above, when configuring SGE, we requested 4GB or memory for each overlap job.  Here, we configure
#  the overlap job itself to use 4GB of memory.
#
ovlMemory        = 4GB --hashload 0.8 --hashstrings 100000
ovlThreads       = 2
ovlHashBlockSize = 180000
ovlRefBlockSize  = 2000000

#  MER OVERLAPPER configuration
#
merOverlapperThreads         = 2
merOverlapperSeedBatchSize   = 90000
merOverlapperExtendBatchSize = 90000

#  ERROR CORRECTION configuration
#
frgCorrBatchSize = 200000
frgCorrThreads   = 2

ovlCorrBatchSize = 800000
Personal tools
Navigation
download
documentation