[Denovoassembler-devel] Planaria transcriptome coverage distribution
Ray -- Parallel genome assemblies for parallel DNA sequencing
Brought to you by:
sebhtml
From: David E. (gringer) <dav...@mp...> - 2011-07-27 12:33:21
|
For testing, here's the coverage distribution that was created by combining all the Smed RNA reads I have: http://user.interface.org.nz/~gringer/hacking/planaria_Ray_5_2011-07-26_RayOutput.CoverageDistribution.txt [~140kb] I had a bit of trouble with firefox freezing when I tried pastebin, which is why it's not there. For what it's worth, some of these RNA transcripts (the low-expression transcripts) only have about 50-80 hits for reads from a single platform. While I don't expect to be able to get the sequences from these transcripts -- for one I looked at, the longest stretch with coverage greater than 5 was about 70bp -- it would be nice to have some way to tune Ray to work with RNA data that has similar distributions of coverage. The analysis is as follows: k-mer length: 31 Lowest coverage observed: 1 MinimumCoverage: 215 PeakCoverage: 216 RepeatCoverage: 217 Number of k-mers with at least MinimumCoverage: 1944260 k-mers Estimated genome length: 972130 nucleotides Percentage of vertices with coverage 1: 64.0808 % DistributionFile: results/Ray_5_2011-07-26/RayOutput.CoverageDistribution.txt I'm not quite sure I agree with a Peak/Repeat difference of 1, but given that I'm not sure what the numbers are used for, I'll leave that strangeness for someone else to comment on. Note that it has a the big decrease in k-mers found for the first bit of the distribution. I'm not sure if this is because of the nature of RNA data or something else. -- David |