wgs-assembler-users Mailing List for Whole-Genome Shotgun Assembler (Page 19)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi, Ole-

Yes, I overlooked a step.  In the contig you insert to the latest version,
update the 'data.contig_status' with what the second to last version has.

FYI, the tigStore should have versions such as:

seqDB.v014.ctg
seqDB.v014.dat
seqDB.v014.utg

seqDB.v015.ctg
seqDB.v015.p001.ctg
seqDB.v015.p001.dat
(etc)
seqDB.v015.utg

seqDB.v016.ctg
seqDB.v016.p001.ctg
seqDB.v016.p001.dat
(etc)
seqDB.v016.utg

(the v numbers will of course be different in your assembly)

v015 contains the output of scaffolder, which is the input to consensus.
Contigs here have no consensus sequence, but otherwise all the data is
present.  It is largely just rewriting the data from v014 into partitions
(p###), so each consensus job can load a single file instead of randomly
accessing a large file.  The status flag on each unitig/contig is also set.
This flag tells if the unitig/contig was placed in a scaffold, is a
surrogate, degenerate, etc.

v016 is the output of consensus, the input to terminator.  All terminator
does is to repackage this into ASCII files.

To summarize: grab the contig from v014 (the last with a consensus
sequence), the status flag from v015, change the status flag in the contig
you grabbed, and then insert the contig into v016.

by doing this, you'll lose VAR records for this contig, but otherwise the
consensus sequence is the same (or largely the same; variant detection can
change it a bit).

b

On 4/11/12 6:23 AM, "Ole Kristian Tørresen" <o.k...@bi...> wrote:

> Hi Brian,
> ctgcns completed now, but I got an error with asmOutputFasta. From
> 9-terminator/asmOutputFasta.err:
> ERROR: Illegal unitigpos type type value 'X' (CCO) at line 1676575956
> 
> Is this connected with the procedure I did with inserting the contig
> from an older tigStore?
> 
> Thank you for your help so far.
> 
> Ole
> 
> On 11 April 2012 08:13, Ole Kristian Tørresen <o.k...@bi...> wrote:
>> Hi Brian.
>> 
>> I've done this, and rerunning ctgcns on that last partition. I'll send
>> the layout and log in a separate email.
>> 
>> Ole
>> 
>> On 10 April 2012 21:37, Walenz, Brian <bw...@jc...> wrote:
>>> Hi Ole-
>>> 
>>> I don't see anything that looks like an error in the log, so I'll have to
>>> assume it crashed.  You report it runs for 20 hours, which is odd for contig
>>> consensus, unless that contig is very very deep.  If so, the ctgcns process
>>> will also be large.  Do you know how big the process was?
>>> 
>>> Can you make the full log available?
>>> 
>>> It is possible to force the contig to have a consensus sequence.  If the job
>>> did crash, the other contigs will still need to have consensus generated.
>>> 
>>> The process is the same as editing a unitig in the tigStore: dump the contig
>>> in question, edit the file to have a consensus sequence, then load that
>>> contig back into the tigStore.  A consensus sequence for this contig can be
>>> found in one of the earlier tigStore versions; the version just before this
>>> one will probably have it.  That makes our process even easier: dump the
>>> version with a consensus sequence, and load it back into the latest version.
>>> 
>>> A sketch of the steps:
>>> 
>>> 1) Dump the previous version of the contig.  check that 'file' does contain
>>> a consensus sequence.
>>> 
>>> tigStore -g *gkpStore -t *tigStore <vers-1> -c <ctgID> -d layout > file
>>> 
>>> 2) Load that pervious version into the tigStore as the latest version
>>> 
>>> tigStore -g *gkpStore -t *tigStore <vers> <part> -c <ctgID> -R file
>>> 
>>> Notice that this tigStore command specifies both a version and a partition
>>> for the tigStore.
>>> 
>>> 3) Rerun consensus.sh on that partition.  It will not attempt to compute the
>>> consensus for that contig.
>>> 
>>> I'd be interested in seeing the contig you dump, if only to verify that it
>>> is deep.
>>> 
>>> b
>>> 
>>> 
>>> 
>>> On 4/10/12 4:05 AM, "Ole Kristian Tørresen" <o.k...@bi...> wrote:
>>> 
>>>> Hi,
>>>> I'm having some problems while doing some low coverage sequencing
>>>> assembly testing. I've tried to assemble about 10x coverage of 150 nt
>>>> paired Illumina reads of 500 bp fragment size. These are from the
>>>> parrot used in the Assemblathon 2
>>>> (http://assemblathon.org/pages/download-data). Everything seems to run
>>>> fine, until contig consensus, where 1 partition just don't succeed. It
>>>> seems to run for quite some time (20 hours or something) before
>>>> failing. These are the last 20 lines from the output of the ctgcns
>>>> partition that fails:
>>>> Alignment params: 297 333 200 200 0  0.12 1e-06 30 1
>>>>  -- e/l = 7/112 =  6.25%
>>>>   A -----+------+---->  []
>>>>   B  332 -------> 40    []
>>>> GetAlignmentTrace()-- Overlap ACCEPTED!  accept=1000.000000
>>>> lScore=0.026087 (112 vs 115) aScore=0.160000 (332 vs 316)
>>>> bScore=0.150000 (-42 vs -27).  (CONTIGF)
>>>> GetAlignmentTrace()-- Overlap found between 1076523 (U) and 25763657
>>>> (R) expected hangs: a=316 b=-27 erate=0.060000 aligner=Local_Overlap
>>>> GetAlignmentTrace()-- Overlap ACCEPTED!  accept=1000.000000
>>>> lScore=0.026087 (112 vs 115) aScore=0.160000 (332 vs 316)
>>>> bScore=0.150000 (-42 vs -27).  (CONTIGF)
>>>> Local_Overlap_AS_forCNS found overlap between 1076523 (U) and 25763657
>>>> (R) ahang: 332, bhang: -42 (expected hang was 316)
>>>> Alignment params: 298 334 200 200 0  0.12 1e-06 30 1
>>>>  -- e/l = 6/112 =  5.36%
>>>>   A -----+------+---->  []
>>>>   B  332 -------> 42    []
>>>> GetAlignmentTrace()-- Overlap ACCEPTED!  accept=1000.000000
>>>> lScore=0.009009 (110 vs 111) aScore=0.140000 (332 vs 318)
>>>> bScore=0.130000 (-42 vs -29).  (CONTIGF)
>>>> GetAlignmentTrace()-- Overlap found between 1076523 (U) and 57537697
>>>> (R) expected hangs: a=318 b=-29 erate=0.060000 aligner=Local_Overlap
>>>> GetAlignmentTrace()-- Overlap ACCEPTED!  accept=1000.000000
>>>> lScore=0.009009 (110 vs 111) aScore=0.140000 (332 vs 318)
>>>> bScore=0.130000 (-42 vs -29).  (CONTIGF)
>>>> Local_Overlap_AS_forCNS found overlap between 1076523 (U) and 57537697
>>>> (R) ahang: 332, bhang: -42 (expected hang was 318)
>>>> Alignment params: 300 336 200 200 0  0.12 1e-06 30 1
>>>>  -- e/l = 6/110 =  5.45%
>>>>   A -----+------+---->  []
>>>>   B  332 -------> 42    []
>>>> 
>>>> This is the error message:
>>>>  at /usit/titan/u1/olekto/src/wgs-7.0/Linux-amd64/bin/runCA line 1237
>>>> main::caFailure('1 consensusAfterScaffolder jobs failed; remove
>>>> 8-consensus/co...', undef) called at
>>>> /usit/titan/u1/olekto/src/wgs-7.0/Linux-amd64/bin/runCA line 5142
>>>> main::postScaffolderConsensus() called at
>>>> /usit/titan/u1/olekto/src/wgs-7.0/Linux-amd64/bin/runCA line 5885
>>>> 
>>>> ----------------------------------------
>>>> Failure message:
>>>> 
>>>> 1 consensusAfterScaffolder jobs failed; remove
>>>> 8-consensus/consensus.sh to try again
>>>> 
>>>> I've tried removing consensus.sh and running again, but get the same error.
>>>> 
>>>> This is the spec file:
>>>> utgErrorRate=0.03
>>>> utgErrorLimit=2.5
>>>> ovlErrorRate=0.06
>>>> cnsErrorRate=0.06
>>>> cgwErrorRate=0.10
>>>> merSize = 22
>>>> overlapper=ovl
>>>> unitigger = bogart
>>>> merylMemory   = 128000
>>>> merylThreads = 16
>>>> merOverlapperThreads = 2
>>>> merOverlapperExtendConcurrency = 8
>>>> merOverlapperSeedConcurrency = 8
>>>> ovlThreads = 2
>>>> mbtThreads = 2
>>>> mbtConcurrency = 8
>>>> ovlConcurrency = 8
>>>> ovlCorrConcurrency = 16
>>>> ovlRefBlockSize  = 32000000
>>>> ovlHashBits = 24
>>>> ovlHashBlockLength = 800000000
>>>> ovlStoreMemory = 128000
>>>> frgCorrThreads    = 2
>>>> frgCorrConcurrency = 8
>>>> ovlCorrBatchSize  = 1000000
>>>> ovlCorrConcurrency = 16
>>>> cnsConcurrency   = 16
>>>> doExtendClearRanges = 0
>>>> 
>>>> I don't need to have that unitig (1076523 (U)) in my finished
>>>> assembly, so it's possible to just remove it as long as I get a
>>>> finished assembly. I've also tried to just create the .success file,
>>>> but then terminator fails.
>>>> 
>>>> Does anyone have any ideas of what I might do different? Can I just
>>>> remove that unitig and proceed? How do I do that?
>>>> 
>>>> Sincerely,
>>>> Ole Kristian Tørresen
>>>> PhD student
>>>> University of Oslo
>>>> 
>>>> ---------------------------------------------------------------------------
>>>> ---
>>>> Better than sec? Nothing is better than sec when it comes to
>>>> monitoring Big Data applications. Try Boundary one-second
>>>> resolution app monitoring today. Free.
>>>> http://p.sf.net/sfu/Boundary-dev2dev
>>>> _______________________________________________
>>>> wgs-assembler-users mailing list
>>>> wgs...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>>> 

2012	Jan (1)	Feb (2)	Mar	Apr (29)	May (8)	Jun (5)	Jul (46)	Aug (16)	Sep (5)	Oct (6)	Nov (17)	Dec (7)
2013	Jan (5)	Feb (2)	Mar (10)	Apr (13)	May (20)	Jun (7)	Jul (6)	Aug (14)	Sep (9)	Oct (19)	Nov (17)	Dec (3)
2014	Jan (3)	Feb	Mar (7)	Apr (1)	May (1)	Jun (30)	Jul (10)	Aug (2)	Sep (18)	Oct (3)	Nov (4)	Dec (13)
2015	Jan (27)	Feb	Mar (19)	Apr (12)	May (10)	Jun (18)	Jul (4)	Aug (2)	Sep (2)	Oct	Nov (1)	Dec (9)
2016	Jan (6)	Feb	Mar	Apr	May	Jun	Jul (1)	Aug (1)	Sep (1)	Oct	Nov	Dec

wgs-assembler-users Mailing List for Whole-Genome Shotgun Assembler (Page 19)

wgs-assembler-users — Discussion about Celera Assembler