Whole-Genome Shotgun Assembler / Feature Requests / #20 Estimate library self-similarity

Estimate library self-similarity

#20 Estimate library self-similarity

Milestone: Overlapper

Status: open

Owner: Sergey Koren

Labels: Report (27)

Priority: 5

Updated: 2008-08-21

Created: 2006-03-09

Creator: Jason Rafe Miller

Private: No

Generate a report that shows how often pairwise overlaps involve two
reads from the same library. Excessive library self-similarity might
indicate library problems.

The report could run after overlapper on OVL output. It would show each
library's intra-library overlaps as a fraction of all overlaps. It could show
observed vs expected values. To calculate expected, set N=number of all
reads and n=number of reads in this library. Expect this library's intra-
library overlaps to be this fraction of all overlaps: (n/N)^2.

All counts should exclude reads that overlap their own mate. These are
expected to occur more often in small-insert (2KB) libraries than large-
insert libraries. Counting them would give biased results.

Discussion

Jason Rafe Miller - 2007-07-26

Logged In: YES
user_id=1220789
Originator: YES

Here is another way to do it, suggested by Aaron. The check for non-randomness could look at histograms of overlaps per read and estimate distance from the expected Poisson distribution. Since all libraries are non-random, the test needs some parameter tuning, and that might be a good intern project.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jason Rafe Miller - 2008-08-21

assigned_to: nobody --> skoren
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Estimate library self-similarity

Group

Searches

Help

#20 Estimate library self-similarity

Discussion