From: kuhl <ku...@mo...> - 2013-08-10 10:28:10
|
Hi, I am running into similar problems, when combining very long read assembly (65000 bp reads which are assembled contigs from a de bruijn graph assembler) with short read paired end and matepair data. The assembly gets messed up by unitig splitting, some of the long reads even end up as singletons. Other contigs end up as degenerates although they are very large (>100 kbp) and as far as I can tell they are unique genomic sequences (not repeats od duplications). Is there a workaround for these problems, or should one not mix very long reads and very short read data? Best wishes, Heiner On Fri, 9 Aug 2013 20:08:50 +0000, "Waldbieser, Geoff" <Geo...@AR...> wrote: > Hi, > CA v7.0 (build assembled 3.3M pacBioToCA-corrected long reads into 146,537 > unitigs (~1Gb) in 24 hrs using 62cpu and 512GB RAM. > > $> /home/software/wgs1/Linux-amd64/bin/runCA -version > CA version CVS TIP ($Id: AS_GKP_main.c,v 1.105 2012-09-13 17:41:13 skoren > Exp $). > CA version CVS TIP ($Id: AS_CGB_unitigger.c,v 1.45 2011-09-06 02:15:18 > mkotelbajcvi Exp $). > CA version CVS TIP ($Id: BuildUnitigs.cc,v 1.88 2012-01-15 23:49:34 > brianwalenz Exp $). > Using up to 64 OpenMP threads. > CA version CVS TIP ($Id: AS_CGW_main.c,v 1.116 2012-11-15 05:04:54 > brianwalenz Exp $). > CA version CVS TIP ($Id: terminator.C,v 1.17 2012-09-10 08:58:11 > brianwalenz Exp $). > > I then added Illumina mate pairs that had been corrected, deduplicated, > and chimeric reads removed. > Library > Pairs > Illumina MP, 3kb insert 4,455,475 > Illumina MP, 8kb insert 2,834,858 > Illumina MP, 36kb insert 1,222,830 > > When these were added, CA spent a few days in splitUnitigs before failing. > > ------------------- > $> tail -60 splitUnitigs.out.FAILED > Creating new unitig 8643669 with 28 fragments > unitig 2478 interval 0 0,920 good > unitig 2478 interval 1 920,1123 bad > unitig 2478 interval 2 1124,1428 good > Fixing contains. > prev 1143,1230 -- 16110937 1220,1143 (no overlap to new 1193,1413) > prev 1143,1230 -- 11823198 1220,1143 (no overlap to new 1193,1413) > prev 1143,1230 -- 14595211 1217,1143 (no overlap to new 1193,1413) > prev 1143,1230 -- 13147242 1213,1143 (no overlap to new 1193,1413) > prev 1143,1230 -- 22697521 1230,1145 (no overlap to new 1193,1413) prev > 1143,1230 -- 4687598 1145,1220 (no overlap to new 1193,1413) > Creating new unitig 8643670 with 24 fragments > Creating new unitig 8643671 with 8 fragments > Creating new unitig 8643672 with 1 fragments > unitig 2494 interval 0 0,16451 good > unitig 2494 interval 1 16451,16502 bad > unitig 2494 interval 2 16503,48319 good > Creating new unitig 8643673 with 265 fragments > Creating new unitig 8643674 with 2 fragments > Creating new unitig 8643675 with 497 fragments > unitig 2504 interval 0 0,18635 good > unitig 2504 interval 1 18635,18769 bad > unitig 2504 interval 2 18770,26279 good > Creating new unitig 8643676 with 295 fragments > Creating new unitig 8643677 with 2 fragments > splitUnitigs: MultiAlignUnitig.C:469: int > unitigConsensus::computePositionFromParent(bool): Assertion > `cnspos[tiid].bgn < cnspos[tiid].end' failed. > > Failed with 'Aborted' > > Backtrace (mangled): > > /home/software/wgs1/Linux-amd64/bin/splitUnitigs(_Z17AS_UTL_catchCrashiP7siginfoPv+0x23)[0x410a13] > /lib64/libpthread.so.0(+0xfd00)[0x7f55fe5bdd00] > /lib64/libc.so.6(gsignal+0x35)[0x7f55fe252d95] > /lib64/libc.so.6(abort+0x17b)[0x7f55fe2542ab] > /lib64/libc.so.6(+0x2d8fe)[0x7f55fe24b8fe] > /lib64/libc.so.6(+0x2d9a2)[0x7f55fe24b9a2] > /home/software/wgs1/Linux-amd64/bin/splitUnitigs[0x42bf9f] > /home/software/wgs1/Linux-amd64/bin/splitUnitigs(_Z16MultiAlignUnitigP11MultiAlignTP7gkStoreP11CNS_OptionsPi+0xf0)[0x42fa10] > /home/software/wgs1/Linux-amd64/bin/splitUnitigs(main+0x28bd)[0x40c78d] > /lib64/libc.so.6(__libc_start_main+0xed)[0x7f55fe23f23d] > /home/software/wgs1/Linux-amd64/bin/splitUnitigs[0x40cbf9] > > Backtrace (demangled): > > [0] > /home/software/wgs1/Linux-amd64/bin/splitUnitigs::AS_UTL_catchCrash(int, > siginfo*, void*) + 0x23 [0x410a13] > [1] /lib64/libpthread.so.0::(null) + 0xfd00 [0x7f55fe5bdd00] > [2] /lib64/libc.so.6::(null) + 0x35 [0x7f55fe252d95] > [3] /lib64/libc.so.6::(null) + 0x17b [0x7f55fe2542ab] > [4] /lib64/libc.so.6::(null) + 0x2d8fe [0x7f55fe24b8fe] > [5] /lib64/libc.so.6::(null) + 0x2d9a2 [0x7f55fe24b9a2] > [6] /home/software/wgs1/Linux-amd64/bin/splitUnitigs() [0x42bf9f] > [7] > /home/software/wgs1/Linux-amd64/bin/splitUnitigs::MultiAlignUnitig(MultiAlignT*, > gkStore*, CNS_Options*, int*) + 0xf0 [0x42fa10] > [8] /home/software/wgs1/Linux-amd64/bin/splitUnitigs::(null) + 0x28bd > [0x40c78d] > [9] /lib64/libc.so.6::(null) + 0xed [0x7f55fe23f23d] > [10] /home/software/wgs1/Linux-amd64/bin/splitUnitigs() [0x40cbf9] > > GDB: > ------------------ > > After seeing that someone else had a splitUnitig problem, I installed > build 4371 and restarted. So far it has run splitUnitigs for 24 hrs and it > is currently working on unitig 35551 out of 9254397. > > The Illumina jump read parameters are: > forceBOGunitigger=0 > isNotRandom=0 > doNotTrustHomopolymerRuns=0 > doTrim_initialNone=0 > doTrim_initialMerBased=0 > doTrim_initialFlowBased=0 > doTrim_initialQualityBased=0 > doRemoveDuplicateReads=0 > doTrim_finalLargestCovered=1 > doTrim_finalEvidenceBased=0 > doTrim_finalBestEdge=0 > doRemoveSpurReads=1 > doRemoveChimericReads=1 > doConsensusCorrection=1 > forceShortReadFormat=1 > constantInsertSize=0 > fastqQualityValues=sanger > fastqOrientation=innie > > The corrPacBio parameters are: > forceBOGunitigger=0 > isNotRandom=0 > doNotTrustHomopolymerRuns=0 > doTrim_initialNone=0 > doTrim_initialMerBased=0 > doTrim_initialFlowBased=0 > doTrim_initialQualityBased=0 > doRemoveDuplicateReads=0 > doTrim_finalLargestCovered=0 > doTrim_finalEvidenceBased=1 > doRemoveSpurReads=1 > doRemoveChimericReads=1 > doConsensusCorrection=1 > forceShortReadFormat=0 > constantInsertSize=0 > fastqQualityValues=sanger > fastqOrientation=innie > > > I have declared bogart as the unitigger. I also set > "doFragmentCorrection=0" and "doOverlapBasedTrimming = 0" because the > Illumina data had already been cleaned and I assumed the Illumina > correction of the PacBio reads was an error correction. Is this leading to > false joins that the unitigger is identifying and having to correct by > splitting? > > Thanks for any input. > Geoff > ___________________________________ > Geoffrey C. Waldbieser > Research Molecular Biologist > Warmwater Aquaculture Research Unit > Agricultural Research Service > United States Department of Agriculture > Stoneville, MS 38776 > (662) 686-3593 > > > > > > This electronic message contains information generated by the USDA solely > for the intended recipients. Any unauthorized interception of this message > or the use or disclosure of the information it contains may violate the law > and subject the violator to civil or criminal penalties. If you believe you > have received this message in error, please notify the sender and delete > the email immediately. -- --------------------------------------------------------------- Dr. Heiner Kuhl MPI Molecular Genetics Tel: + 49 + 30 / 8413 1776 Next Generation Sequencing Ihnestrasse 73 email: ku...@mo... D-14195 Berlin http://www.molgen.mpg.de/SeqCore --------------------------------------------------------------- |