#134 Allow alternate bank location in pacBioToCA for speedup


I'm running the pacBioToCA pipeline on a large genome (820 Mb) with 6x pacBio coverage and 50x Illumina coverage for correction. I noticed that one of the most time-consuming steps is the runPartition.sh script that calls AMOS. The reason this step is so slow is because bank-transact and make-consensus spend most of their time in disk wait state, especially when running hundreds of jobs in parallell, all reading/writing to a shared disk. I found that I could dramatically speed up runPartition.sh by modifying the script to create the bank partitions (asm.bnk_partitionXXX.bnk) on a ram disk (/dev/shm) and then moving the bank back to the $wrk/temp$libraryname after the script completed. Our cluster nodes have plenty of memory (256 Gb) and since the banks don't take up that much space, the extra memory used isn't an issue.

While it's easy enough to make this modification myself, it would be convenient to have an extra optional argument to the runPacBioToCA script to specify an alternate temporary location for the amos banks.


    Sergey Koren - 2014-12-02

    This option is available in CA 8.2 via the bankPath option. However, we support additional consensus modules as well for this step which should alleviate the bottleneck.

    Sergey Koren - 2014-12-02
