From: Walenz, B. <bw...@jc...> - 2013-09-23 05:07:21
|
Hi, Ole- You are correct in the interpretation of size 20 gaps. Half seems kind of high, but I can’t say I’ve ever counted. I had something that almost did what you’re after. I made a few mods (svn update!) to make it even closer. It will now output a list of the mates that are in the same scaffold but different contigs (unitigs). Output file OUTPUT.mate.diff.ctgscf will contain the mates these mates. There are three other similar outputs; one for mates in the same contig, and two for the same analysis on unitigs. 9-terminator% analyzePosMap –p ASM –o OUTPUT –g ../ASM.gkpStore –A libraryfate For a pair of overlapping contigs, a simple grep will get all the mates between them. CAUTION! Mates can span multiple gaps! I have no documentation for this. Feel free (and encouraged) to write some for me, even if its just an outline. Super easy to add this to runCA if useful....and documented (hint, hint). b On 9/22/13 7:44 AM, "Ole Kristian Tørresen" <o.k...@ib...> wrote: Hi, I've been thinking a bit. In some of the assemblies I have, half the gaps are of size 20. This means that the contigs that are on each side of the gap, was supposed to overlap, but had too much differences to be able to be merged (10%). Have I understood this correctly? Is half the gaps in size 20 an expected number? I'm wondering if these failed overlaps is because of heterozygosity, for example, the wrong haplotype is tested for overlap. Could this be the case? I'm not sure how to test this though. One possibility is that there's a difference in the length of the haplotypes. So if I can find all the insert sizes of the pairs that span a gap of 20 bases, I might see that they group into two different groups. If the length differences is large, this should be pretty clear. Is there a way to get all the insert sizes of the pairs that map across these gaps? Thank you. Ole |