geneBody_coverage graphs misleading

RNA-seq data QC

Brought to you by: marghoobm, mysango, zbfish

#6 geneBody_coverage graphs misleading

Milestone: 2.0

Status: open

Owner: nobody

Labels: None

Updated: 2021-06-01

Created: 2021-06-01

Creator: Todd Cameron

Private: No

In geneBody_coverage.py, the algorithm for coverage appears to scale each sample so that the minimum coverage is 0 and the maximum is 1.0, which (as far as I can tell) happens in this line:

dataset.append((name, [(i -min(dat))/(max(dat) - min(dat)) for i in dat], skewness))

While there isn't any issue with this approach on its own, the y-axis of the line graphs, 'coverage', likewise ranges from 0.0 to 1.0. This has the unfortunate effect of implying that '0 coverage' on these graphs means there were no reads in that region, when in reality it is simply the global minimum. It you have a sample where the minimum coverage was 30%, 80%, or 99% of the maximum, then that 30%, 80%, or 99% point will still be graphed as '0 coverage'.

Anyways, the current output is confusing as is... it really wasn't clear to me what was going on here until I looked at the code directly. Perhaps the y-axis and / or scale can be relabeled to more accurately reflect what is being graphed.

geneBody_coverage graphs misleading

RNA-seq data QC

Milestone

Searches

Help

#6 geneBody_coverage graphs misleading

Discussion