From: Ole K. T. <o.k...@bi...> - 2012-04-12 16:16:01
|
Hi Brian. Incidentally, the numbers are the same in mine. I thought maybe you had gleaned my numbers from the files I sent you, and that you used them to make it easier for me to understand. :) To sum up: The contig is identical in version 14 and 16, same 'data.contig_status' (set to U), and as far as I can see, same consensus and length on sequence. In version 15 however, the consensus (and quality scores) are lacking, and the length of the contig is set to 0 and 'data.contig_status' is also U. I dumped the contigs by using 'tigStore -g *gkpStore -t *tigStore 14 -c 107652 -d layout3 > ctg1076523', and just varying the version number. I tried loading the contig a couple of times into version 15 and then dumping it again, but still, it was without consensus sequence and asmFastaOutput fails. Have I messed up the latter stages of my assembly by doing this? Is it possible to fix this in any way? Thank for your help so far. It's good to learn more about Celera. Ole On 12 April 2012 16:50, Walenz, Brian <bw...@jc...> wrote: > Hi Ole- > > The version numbers will be different in different assemblies. Mine came > from a small assembly with little scaffolding work. Larger assemblies can > have more than 100 versions. 'ls -l *tigStore' will show the versions - you > want to use the last three. > > b > > > > On 4/12/12 3:52 AM, "Ole Kristian Tørresen" <o.k...@bi...> wrote: > >> Hi, Brian. >> >> Thank you for your help so far, but I seem to be missing something. >> >> I did this: >> tigStore -g *gkpStore -t *tigStore 14 -c 107652 -d layout3 > ctg1076523 >> tigStore -g *gkpStore -t *tigStore 15 -cp 36 -R ctg1076523 >> >> But when I dump the same contig from version 15: >> tigStore -g *gkpStore -t *tigStore 15 -c 107652 -d layout3 > ctg1076523_v15 >> it's without consensus sequence: >> contig 1076523 >> len 0 >> cns >> qlt >> data.unitig_coverage_stat -9874.792662 >> data.unitig_microhet_prob 0.000000 >> data.unitig_status X >> data.unitig_unique_rept X >> data.contig_status U >> data.num_frags 14302 >> data.num_unitigs 1 >> >> The if I dump it from version 16, it's identical to the one from >> version 14 (that is, with consensus). I've tried loading it several >> times, but each time I dump it again it's lost consensus. Do you know >> what I'm doing wrong? >> >> Ole >> >> On 11 April 2012 20:54, Walenz, Brian <bw...@jc...> wrote: >>> Hi, Ole- >>> >>> Yes, I overlooked a step. In the contig you insert to the latest version, >>> update the 'data.contig_status' with what the second to last version has. >>> >>> FYI, the tigStore should have versions such as: >>> >>> seqDB.v014.ctg >>> seqDB.v014.dat >>> seqDB.v014.utg >>> >>> seqDB.v015.ctg >>> seqDB.v015.p001.ctg >>> seqDB.v015.p001.dat >>> (etc) >>> seqDB.v015.utg >>> >>> seqDB.v016.ctg >>> seqDB.v016.p001.ctg >>> seqDB.v016.p001.dat >>> (etc) >>> seqDB.v016.utg >>> >>> (the v numbers will of course be different in your assembly) >>> >>> v015 contains the output of scaffolder, which is the input to consensus. >>> Contigs here have no consensus sequence, but otherwise all the data is >>> present. It is largely just rewriting the data from v014 into partitions >>> (p###), so each consensus job can load a single file instead of randomly >>> accessing a large file. The status flag on each unitig/contig is also set. >>> This flag tells if the unitig/contig was placed in a scaffold, is a >>> surrogate, degenerate, etc. >>> >>> v016 is the output of consensus, the input to terminator. All terminator >>> does is to repackage this into ASCII files. >>> >>> To summarize: grab the contig from v014 (the last with a consensus >>> sequence), the status flag from v015, change the status flag in the contig >>> you grabbed, and then insert the contig into v016. >>> >>> by doing this, you'll lose VAR records for this contig, but otherwise the >>> consensus sequence is the same (or largely the same; variant detection can >>> change it a bit). >>> >>> b >>> >>> >>> On 4/11/12 6:23 AM, "Ole Kristian Tørresen" <o.k...@bi...> wrote: >>> >>>> Hi Brian, >>>> ctgcns completed now, but I got an error with asmOutputFasta. From >>>> 9-terminator/asmOutputFasta.err: >>>> ERROR: Illegal unitigpos type type value 'X' (CCO) at line 1676575956 >>>> >>>> Is this connected with the procedure I did with inserting the contig >>>> from an older tigStore? >>>> >>>> Thank you for your help so far. >>>> >>>> Ole >>>> >>>> On 11 April 2012 08:13, Ole Kristian Tørresen <o.k...@bi...> >>>> wrote: >>>>> Hi Brian. >>>>> >>>>> I've done this, and rerunning ctgcns on that last partition. I'll send >>>>> the layout and log in a separate email. >>>>> >>>>> Ole >>>>> >>>>> On 10 April 2012 21:37, Walenz, Brian <bw...@jc...> wrote: >>>>>> Hi Ole- >>>>>> >>>>>> I don't see anything that looks like an error in the log, so I'll have to >>>>>> assume it crashed. You report it runs for 20 hours, which is odd for >>>>>> contig >>>>>> consensus, unless that contig is very very deep. If so, the ctgcns >>>>>> process >>>>>> will also be large. Do you know how big the process was? >>>>>> >>>>>> Can you make the full log available? >>>>>> >>>>>> It is possible to force the contig to have a consensus sequence. If the >>>>>> job >>>>>> did crash, the other contigs will still need to have consensus generated. >>>>>> >>>>>> The process is the same as editing a unitig in the tigStore: dump the >>>>>> contig >>>>>> in question, edit the file to have a consensus sequence, then load that >>>>>> contig back into the tigStore. A consensus sequence for this contig can >>>>>> be >>>>>> found in one of the earlier tigStore versions; the version just before >>>>>> this >>>>>> one will probably have it. That makes our process even easier: dump the >>>>>> version with a consensus sequence, and load it back into the latest >>>>>> version. >>>>>> >>>>>> A sketch of the steps: >>>>>> >>>>>> 1) Dump the previous version of the contig. check that 'file' does >>>>>> contain >>>>>> a consensus sequence. >>>>>> >>>>>> tigStore -g *gkpStore -t *tigStore <vers-1> -c <ctgID> -d layout > file >>>>>> >>>>>> 2) Load that pervious version into the tigStore as the latest version >>>>>> >>>>>> tigStore -g *gkpStore -t *tigStore <vers> <part> -c <ctgID> -R file >>>>>> >>>>>> Notice that this tigStore command specifies both a version and a partition >>>>>> for the tigStore. >>>>>> >>>>>> 3) Rerun consensus.sh on that partition. It will not attempt to compute >>>>>> the >>>>>> consensus for that contig. >>>>>> >>>>>> I'd be interested in seeing the contig you dump, if only to verify that it >>>>>> is deep. >>>>>> >>>>>> b >>>>>> >>>>>> >>>>>> >>>>>> On 4/10/12 4:05 AM, "Ole Kristian Tørresen" <o.k...@bi...> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> I'm having some problems while doing some low coverage sequencing >>>>>>> assembly testing. I've tried to assemble about 10x coverage of 150 nt >>>>>>> paired Illumina reads of 500 bp fragment size. These are from the >>>>>>> parrot used in the Assemblathon 2 >>>>>>> (http://assemblathon.org/pages/download-data). Everything seems to run >>>>>>> fine, until contig consensus, where 1 partition just don't succeed. It >>>>>>> seems to run for quite some time (20 hours or something) before >>>>>>> failing. These are the last 20 lines from the output of the ctgcns >>>>>>> partition that fails: >>>>>>> Alignment params: 297 333 200 200 0 0.12 1e-06 30 1 >>>>>>> -- e/l = 7/112 = 6.25% >>>>>>> A -----+------+----> [] >>>>>>> B 332 -------> 40 [] >>>>>>> GetAlignmentTrace()-- Overlap ACCEPTED! accept=1000.000000 >>>>>>> lScore=0.026087 (112 vs 115) aScore=0.160000 (332 vs 316) >>>>>>> bScore=0.150000 (-42 vs -27). (CONTIGF) >>>>>>> GetAlignmentTrace()-- Overlap found between 1076523 (U) and 25763657 >>>>>>> (R) expected hangs: a=316 b=-27 erate=0.060000 aligner=Local_Overlap >>>>>>> GetAlignmentTrace()-- Overlap ACCEPTED! accept=1000.000000 >>>>>>> lScore=0.026087 (112 vs 115) aScore=0.160000 (332 vs 316) >>>>>>> bScore=0.150000 (-42 vs -27). (CONTIGF) >>>>>>> Local_Overlap_AS_forCNS found overlap between 1076523 (U) and 25763657 >>>>>>> (R) ahang: 332, bhang: -42 (expected hang was 316) >>>>>>> Alignment params: 298 334 200 200 0 0.12 1e-06 30 1 >>>>>>> -- e/l = 6/112 = 5.36% >>>>>>> A -----+------+----> [] >>>>>>> B 332 -------> 42 [] >>>>>>> GetAlignmentTrace()-- Overlap ACCEPTED! accept=1000.000000 >>>>>>> lScore=0.009009 (110 vs 111) aScore=0.140000 (332 vs 318) >>>>>>> bScore=0.130000 (-42 vs -29). (CONTIGF) >>>>>>> GetAlignmentTrace()-- Overlap found between 1076523 (U) and 57537697 >>>>>>> (R) expected hangs: a=318 b=-29 erate=0.060000 aligner=Local_Overlap >>>>>>> GetAlignmentTrace()-- Overlap ACCEPTED! accept=1000.000000 >>>>>>> lScore=0.009009 (110 vs 111) aScore=0.140000 (332 vs 318) >>>>>>> bScore=0.130000 (-42 vs -29). (CONTIGF) >>>>>>> Local_Overlap_AS_forCNS found overlap between 1076523 (U) and 57537697 >>>>>>> (R) ahang: 332, bhang: -42 (expected hang was 318) >>>>>>> Alignment params: 300 336 200 200 0 0.12 1e-06 30 1 >>>>>>> -- e/l = 6/110 = 5.45% >>>>>>> A -----+------+----> [] >>>>>>> B 332 -------> 42 [] >>>>>>> >>>>>>> This is the error message: >>>>>>> at /usit/titan/u1/olekto/src/wgs-7.0/Linux-amd64/bin/runCA line 1237 >>>>>>> main::caFailure('1 consensusAfterScaffolder jobs failed; remove >>>>>>> 8-consensus/co...', undef) called at >>>>>>> /usit/titan/u1/olekto/src/wgs-7.0/Linux-amd64/bin/runCA line 5142 >>>>>>> main::postScaffolderConsensus() called at >>>>>>> /usit/titan/u1/olekto/src/wgs-7.0/Linux-amd64/bin/runCA line 5885 >>>>>>> >>>>>>> ---------------------------------------- >>>>>>> Failure message: >>>>>>> >>>>>>> 1 consensusAfterScaffolder jobs failed; remove >>>>>>> 8-consensus/consensus.sh to try again >>>>>>> >>>>>>> I've tried removing consensus.sh and running again, but get the same >>>>>>> error. >>>>>>> >>>>>>> This is the spec file: >>>>>>> utgErrorRate=0.03 >>>>>>> utgErrorLimit=2.5 >>>>>>> ovlErrorRate=0.06 >>>>>>> cnsErrorRate=0.06 >>>>>>> cgwErrorRate=0.10 >>>>>>> merSize = 22 >>>>>>> overlapper=ovl >>>>>>> unitigger = bogart >>>>>>> merylMemory = 128000 >>>>>>> merylThreads = 16 >>>>>>> merOverlapperThreads = 2 >>>>>>> merOverlapperExtendConcurrency = 8 >>>>>>> merOverlapperSeedConcurrency = 8 >>>>>>> ovlThreads = 2 >>>>>>> mbtThreads = 2 >>>>>>> mbtConcurrency = 8 >>>>>>> ovlConcurrency = 8 >>>>>>> ovlCorrConcurrency = 16 >>>>>>> ovlRefBlockSize = 32000000 >>>>>>> ovlHashBits = 24 >>>>>>> ovlHashBlockLength = 800000000 >>>>>>> ovlStoreMemory = 128000 >>>>>>> frgCorrThreads = 2 >>>>>>> frgCorrConcurrency = 8 >>>>>>> ovlCorrBatchSize = 1000000 >>>>>>> ovlCorrConcurrency = 16 >>>>>>> cnsConcurrency = 16 >>>>>>> doExtendClearRanges = 0 >>>>>>> >>>>>>> I don't need to have that unitig (1076523 (U)) in my finished >>>>>>> assembly, so it's possible to just remove it as long as I get a >>>>>>> finished assembly. I've also tried to just create the .success file, >>>>>>> but then terminator fails. >>>>>>> >>>>>>> Does anyone have any ideas of what I might do different? Can I just >>>>>>> remove that unitig and proceed? How do I do that? >>>>>>> >>>>>>> Sincerely, >>>>>>> Ole Kristian Tørresen >>>>>>> PhD student >>>>>>> University of Oslo >>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> -- >>>>>>> --- >>>>>>> Better than sec? Nothing is better than sec when it comes to >>>>>>> monitoring Big Data applications. Try Boundary one-second >>>>>>> resolution app monitoring today. Free. >>>>>>> http://p.sf.net/sfu/Boundary-dev2dev >>>>>>> _______________________________________________ >>>>>>> wgs-assembler-users mailing list >>>>>>> wgs...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>> >>> > |