From: Walenz, B. <bw...@jc...> - 2013-12-06 18:51:35
|
Nope, it requires two mates to scaffold, and possibly sequence overlap if one if implied by the mates. Until repeats are resolved late late in the scaffolding stage, all the pairs are unique. When multiple pairs span a gap, the size of the gap is much more accurately estimated. This will influence later scaffold merges, by making the bounds on the gap size tighter. Looser bounds imply that we can make an incorrect scaffold join because we can stretch the gap to fit in a contig that otherwise has no sequence alignments to the neighbor contigs. E.g., mates imply the inserted contig could overlap its neighbors by 5k, but the loose bound also lets us stretch the gap to fit the contig with no overlaps. You can try reducing the number of pairs needed by decreasing MIN_EDGES from 2 to 1 in src/AS_CGW/GraphCGW_T.H. BUT, the 2-edge assumption has been in the assembler since the start, and this might not be the only place it is set. AND, suffice to say that I haven't tried this. b On 12/6/13 10:28 AM, "Waldbieser, Geoff" <Geo...@AR...> wrote: > Hi, > In an assembly in which unitigs are produced from a relatively low (4-7X) > coverage of longer reads (Sanger, PacBio, Moleculo, hopefully others to come), > is a single pair of reads (PE or MP) sufficient to scaffold two contigs, > assuming the pair is unique in the genome? Would there be any difference > between scaffolding with a unique single pair vs unique multiple pairs that > span a particular gap? > > Geoff > ________________________________ > Geoffrey C. Waldbieser > Research Molecular Biologist > USDA, ARS, Warmwater Aquaculture Research Unit > 141 Experiment Station Road > Stoneville, MS 38776 > |