Menu

#20 Estimate library self-similarity

Overlapper
open
Report (27)
5
2008-08-21
2006-03-09
No

Generate a report that shows how often pairwise overlaps involve two
reads from the same library. Excessive library self-similarity might
indicate library problems.

The report could run after overlapper on OVL output. It would show each
library's intra-library overlaps as a fraction of all overlaps. It could show
observed vs expected values. To calculate expected, set N=number of all
reads and n=number of reads in this library. Expect this library's intra-
library overlaps to be this fraction of all overlaps: (n/N)^2.

All counts should exclude reads that overlap their own mate. These are
expected to occur more often in small-insert (2KB) libraries than large-
insert libraries. Counting them would give biased results.

Discussion

  • Jason Rafe Miller

    Logged In: YES
    user_id=1220789
    Originator: YES

    Here is another way to do it, suggested by Aaron. The check for non-randomness could look at histograms of overlaps per read and estimate distance from the expected Poisson distribution. Since all libraries are non-random, the test needs some parameter tuning, and that might be a good intern project.

     
  • Jason Rafe Miller

    • assigned_to: nobody --> skoren
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.