Too large output when running runCorrection.sh

#336 Too large output when running runCorrection.sh

Milestone: correction

Status: closed-duplicate

Owner: nobody

Labels: None

Priority: 5

Updated: 2015-12-08

Created: 2015-12-08

Creator: shichengcheng

Private: No

Hello.
I am using the PBcR pipeline to assemble genome with pacbio data. The genome is with high heterozygous and the size is about 1.5~2Gb. I use ~19Gb(~10X) pacbio reads to perform self-correction and assembly.

When the process run into “runCorrection.sh”, it generate large output files, which are all named “asm.[1-100].shortmap.var” without any error or warning in “asm.layout.err”. Till now these files have account for 4.6T on my disk and keep on increasing... Is there something wrong or is it normal for large genome assembly? Output with >4.6T is too large!

Here is my main script:
…/wgs-8.3rc2/Linux-amd64/bin/PBcR -length 500 -partitions 100 -libraryname xjp24 -threads 7 -fastq pacbio.fastq -s pacbio.spec

And the pacbio.spec is:

original asm settings

utgErrorRate = 0.25

utgErrorLimit = 4.5

cnsErrorRate = 0.25

cgwErrorRate = 0.25

ovlErrorRate = 0.25

asmUtgErrorRate=0.10
asmCnsErrorRate=0.10
asmCgwErrorRate=0.10
asmOBT=1
asmObtErrorRate=0.08
asmObtErrorLimit=4.5
utgGraphErrorRate=0.05
utgMergeErrorRate=0.05
ovlHashBits=24
ovlHashLoad=0.80

merSize = 14

merylMemory = 32000
merylThreads = 8

ovlStoreMemory = 32000
ovlMemory = 32

grid info

useGrid = 0
scriptOnGrid = 0
frgCorrOnGrid = 0
ovlCorrOnGrid = 0

ovlHashBits = 25
ovlThreads = 3
ovlHashBlockLength = 1000000000
ovlRefBlockSize = 1000000000

frgCorrThreads = 10
frgCorrBatchSize = 100000

ovlCorrBatchSize = 500000

ovlConcurrency = 10
cnsConcurrency = 10
frgCorrConcurrency = 10
ovlCorrConcurrency = 10
cnsConcurrency = 10

And I also attarched the “runCorrection.sh” and “asm.layout.err” here.

Would you please help to check them and give some suggestion abot the large output? Is there any more appropriate parameter settings to reduce outputs for large genome assembly(For example, assmble a 2Gb genome with ~50X pacbio data)?

Thank you very much!

2 Attachments

asm.layout.err

runCorrection.sh

Discussion

Sergey Koren - 2015-12-08

status: open --> closed-duplicate
If you would like to refer to this comment somewhere else in this project, copy and paste the following link: