From: Brian W. <th...@gm...> - 2014-07-25 23:07:49
|
If my suspicion is correct - keep in mind, all this is a total guess on what I imagine is happening - it's likely a mess that can be pushed to completion now. All the obvious scaffolding should be done already. Bump it out of the scaffold merging steps, but let the other cgw steps run. Possibly, you can get away with increasing the min weight (6? 8? no good guess), instead of manually forcing it to stop merging. On Fri, Jul 25, 2014 at 11:23 AM, Waldbieser, Geoff < Geo...@ar...> wrote: > So in this case adding the Illumina PE reads would not have helped? > > Is the graph trying to detangle or is it likely to be a mess that needs to > be axed now? > > > > > > *From:* Brian Walenz [mailto:th...@gm...] > *Sent:* Friday, July 25, 2014 8:11 AM > > *To:* Waldbieser, Geoff > *Subject:* Re: [wgs-assembler-users] Does scaffolding scale with > available RAM? > > > > Sorry, I owe you a few replies. I switched jobs, and now can't read gmail > at work, or work at home. > > It's not that the pacbio assembled through repeats, but that the pacbio > reads themselves get through (larger) repeats. Without the pacbio, bogart > will detect the repeat, notice that no read spans it, and excise it from > the unitig. With the pacbio, bogart again detects the repeat, but now that > a read spans it, the repeat is left in the unitig. > > That would be great, except that the repeat illumina mates are now a total > mess. With just illumina, the repeats are isolated to short unitigs, and > only those mates are a mess, but scaffolder was designed to handle this > case. With the longer repeats included in longer unitigs, and illumina > mates placed incorrectly in those, the scaffold graph is a mess. > > E.g., > > unitig1: unique1-repeatA-unique2 > unitig2: unique3-repeatB-unique4 (where repeatA and repeatB are related) > > It is possible to get a mate between repeatA and unique4, when really it > should be in repeatB. > > Your pacbio-only assembly was from correction of the pacbio with > illumina? I'm surprised it was that bad. > > > > > > On Mon, Jul 21, 2014 at 6:32 PM, Waldbieser, Geoff < > Geo...@ar...> wrote: > > First of all, thanks for saving us $100k on a high Mem server. > > > > When I mapped BAC end sequences to the Illumina-only assembly > (MaSuRCA-2.2.0) the avg insert length of contained mates was 165kb which > was on the dot for that BAC library. When I mapped to the PacBio-only > assembly the insert sizes were in the 30kb range, so I knew something was > wrong. That would support your idea of assembling through repeats and > perhaps through the wrong repeats. So I thought including the Illumina mate > pairs might help the PacBio assembly but apparently the MPs just made it > more convoluted. > > > > Aleksey had suggested not using the PacBio at all for assembly, just for > gap closure. Maybe it’s time to pull the plug on this one, maybe shred the > PacBio reads to overlapping 2kb lengths to use on MaSuRCA. But then again > it could end soon (I tell myself every day). Is there a reasonable way to > estimate how many contigs have been incorporated thus estimating how many > there are to go? > > > > > > > > |