Re: [wgs-assembler-users] Number of paired reads per scaffold

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Nope, it requires two mates to scaffold, and possibly sequence overlap if
one if implied by the mates.

Until repeats are resolved late late in the scaffolding stage, all the pairs
are unique.

When multiple pairs span a gap, the size of the gap is much more accurately
estimated.  This will influence later scaffold merges, by making the bounds
on the gap size tighter.  Looser bounds imply that we can make an incorrect
scaffold join because we can stretch the gap to fit in a contig that
otherwise has no sequence alignments to the neighbor contigs.  E.g., mates
imply the inserted contig could overlap its neighbors by 5k, but the loose
bound also lets us stretch the gap to fit the contig with no overlaps.

You can try reducing the number of pairs needed by decreasing MIN_EDGES from
2 to 1 in src/AS_CGW/GraphCGW_T.H.  BUT, the 2-edge assumption has been in
the assembler since the start, and this might not be the only place it is
set.  AND, suffice to say that I haven't tried this.

b

On 12/6/13 10:28 AM, "Waldbieser, Geoff" <Geo...@AR...>
wrote:

> Hi,
> In an assembly in which unitigs are produced from a relatively low (4-7X)
> coverage of longer reads (Sanger, PacBio, Moleculo, hopefully others to come),
> is a single pair of reads (PE or MP) sufficient to scaffold two contigs,
> assuming the pair is unique in the genome? Would there be any difference
> between scaffolding with a unique single pair vs unique multiple pairs that
> span a particular gap?
> 
> Geoff
> ________________________________
> Geoffrey C. Waldbieser
> Research Molecular Biologist
> USDA, ARS, Warmwater Aquaculture Research Unit
> 141 Experiment Station Road
> Stoneville, MS 38776
>