I used 35X pacbio data to assembly a genome near 1GB by PBcR. In the corretion step, the software produced too large files which lead to the err reported in the asm.layout.err. If there is something wrong with the parameters setting, or it really need such large space. How large space I need . Thanks!
"....................................
In thread 41 going to output files 137-139 with 25443
In thread 58 going to output files 194-196 with 25443
In thread 55 going to output files 184-186 with 25443
In thread 52 going to output files 174-176 with 25443
In thread 47 going to output files 157-159 with 25443
In thread 57 going to output files 190-193 with 25443
safeWrite()-- Write failure on AS_PBR_writeLayRecord: No space left on device
safeWrite()-- Wanted to write 8388606 objects (size=4), wrote 1858938."
This is the spec file.
"ovlMemory = 300
merSize=16
asmOvlErrorRate=0.10
asmUtgErrorRate=0.10
asmCnsErrorRate=0.10
asmCgwErrorRate=0.10
asmOBT=1
asmObtErrorRate=0.08
asmObtErrorLimit=4.5
utgGraphErrorRate=0.05
utgMergeErrorRate=0.05
ovlHashBits=24
ovlHashLoad=0.80"
This is shell to run PBcR
"PBcR -length 1000 -partitions 200 -l species -s pacbio.spec -fastq all.fastq -threads 60 -maxCoverage 30 genomeSize=900000000 >PBcR.err"
Typically a human-sized genome requires 2-4TB of space to run. It can be higher for more repetitive genomes since these have more overlaps.
There really isn't anything you can change to reduce the space usage, most of it is in storing the overlaps between raw sequences and the layouts for corrected reads. You can make sure that there are no *.dat files in the temporary folder/1-overlapper directory and that there are no *.ovb in temporary folder/1-overlapper/001/*. You can estimate how much space you need based on your current usage. The asm.layout.err step is converting the overlap store to a read-based layout which will be used to generate corrected sequences. Thus, it will need approximately double the asm.ovlStore's space to finish. Once the layouts are generated the asm.ovlStore could be removed and the pipeline will remove it for you when it completes generating corrected sequences.