Generate a report that shows how often pairwise overlaps involve two
reads from the same library. Excessive library self-similarity might
indicate library problems.
The report could run after overlapper on OVL output. It would show each
library's intra-library overlaps as a fraction of all overlaps. It could show
observed vs expected values. To calculate expected, set N=number of all
reads and n=number of reads in this library. Expect this library's intra-
library overlaps to be this fraction of all overlaps: (n/N)^2.
All counts should exclude reads that overlap their own mate. These are
expected to occur more often in small-insert (2KB) libraries than large-
insert libraries. Counting them would give biased results.
Logged In: YES
user_id=1220789
Originator: YES
Here is another way to do it, suggested by Aaron. The check for non-randomness could look at histograms of overlaps per read and estimate distance from the expected Poisson distribution. Since all libraries are non-random, the test needs some parameter tuning, and that might be a good intern project.