From: Serge K. <se...@um...> - 2014-08-21 19:56:43
|
Hi, Sorry for the delayed reply, I missed your post in my email. The high heterozygosity could definitely have an effect on the throughput of the correction. I would suggest increasing the sensitivity further and not specifying -pbCNS on your command line (this consensus module is faster but less robust to higher error data and so could be negatively affected by heterozygosity). mhap = "-k 14 --num-hashes 768 --num-min-matches 3 --threshold 0.04" merSize = 14 If you could send your asm.layout.err file, I can get more information and confirm whether the low output is due to the consensus or the sensitivity parameters. Sergey On Aug 18, 2014, at 12:52 PM, Jason Hill <jas...@zo...> wrote: > Hello PBcR and WGS community, > > I’m working with what should be 100x pacbio coverage and after using PBcR I’m ending up with at best 7x - 8x of corrected reads. My initial read set is about 11million reads, with an average length of 3000bp. After error correction my best run resulted in 1.2million reads with an average length of 2000bp. My genome has a relatively high heterozygosity as a terrestrial insect. I’ve adjusted both max_coverage and increased genome size to try to account for this but see fewer and shorter reads than using the default PBcR parameters. My current run is being done with following the command spec file. I’m using the latest version of all WGS, 8.2b. > > ############## pacbio.spec ############# > assemble = 0 > localStaging = /wgs_pacbio_assembly/PBcR_self_correction/staging > > #faster overlapper with more sensitive settings > mhap = "-k 16 --num-hashes 1256 --num-min-matches 3 --threshold 0.04" > merSize = 16 > > #system memory parameters to avoid fraction bug > ovlMemory = 512 > ovlStoreMemory = 512000 > merylMemory = 512000 > > #increase coverage depth to counter heterozygosity/error rate > #usually results in less corrected reads > maxCoverage = 60 > > #increase genome size to counter heterozygosity, actual genome size 350MB > #usually results in less corrected reads > genomeSize = 500000000 > ##################################### > > $PBcR -pbCNS\ > -length 300\ > -partitions 65\ > -l corrected_pb_1\ > -t 64\ > -s pacbio.spec\ > -noclean\ > -fastq pb.fastq 2>&1 | tee self_corrected_pb_1.log > > When looking at the corrected read lists in the temporary directory I see what appear to be deleted reads of a length I would assume would make the cut, for example: > >> 100003680002,3680002 mate=0,0 lib=corrected_pb_1,1 clr=LATEST,1,2219 deleted=1 > cgtatgtaaaccaattttatactgatggggcgcgaaataacttttcttaagttccttgtgtccaaaca… continues for a total of 2219 bp. > > As it is, none of the overlap layout assemblers can do much with the low coverage I end up with so I’m very eager to hear ideas of how I can move this forward. Would you please take a look and let me know how you would proceed? I would be happy to supply any additional information and files. > > -Jason > > > > > |