- status: open --> closed-duplicate
Hello.
I am using the PBcR pipeline to assemble genome with pacbio data. The genome is with high heterozygous and the size is about 1.5~2Gb. I use ~19Gb(~10X) pacbio reads to perform self-correction and assembly.
When the process run into “runCorrection.sh”, it generate large output files, which are all named “asm.[1-100].shortmap.var” without any error or warning in “asm.layout.err”. Till now these files have account for 4.6T on my disk and keep on increasing... Is there something wrong or is it normal for large genome assembly? Output with >4.6T is too large!
Here is my main script:
…/wgs-8.3rc2/Linux-amd64/bin/PBcR -length 500 -partitions 100 -libraryname xjp24 -threads 7 -fastq pacbio.fastq -s pacbio.spec
And the pacbio.spec is:
asmUtgErrorRate=0.10
asmCnsErrorRate=0.10
asmCgwErrorRate=0.10
asmOBT=1
asmObtErrorRate=0.08
asmObtErrorLimit=4.5
utgGraphErrorRate=0.05
utgMergeErrorRate=0.05
ovlHashBits=24
ovlHashLoad=0.80
merSize = 14
merylMemory = 32000
merylThreads = 8
ovlStoreMemory = 32000
ovlMemory = 32
useGrid = 0
scriptOnGrid = 0
frgCorrOnGrid = 0
ovlCorrOnGrid = 0
ovlHashBits = 25
ovlThreads = 3
ovlHashBlockLength = 1000000000
ovlRefBlockSize = 1000000000
frgCorrThreads = 10
frgCorrBatchSize = 100000
ovlCorrBatchSize = 500000
ovlConcurrency = 10
cnsConcurrency = 10
frgCorrConcurrency = 10
ovlCorrConcurrency = 10
cnsConcurrency = 10
And I also attarched the “runCorrection.sh” and “asm.layout.err” here.
Would you please help to check them and give some suggestion abot the large output? Is there any more appropriate parameter settings to reduce outputs for large genome assembly(For example, assmble a 2Gb genome with ~50X pacbio data)?
Thank you very much!