From: Ole K. T. <o.k...@bi...> - 2012-09-10 14:28:50
|
On 10 September 2012 12:31, Ole Kristian Tørresen <o.k...@bi...> wrote: > On 10 September 2012 09:31, Walenz, Brian <bw...@jc...> wrote: >> Hi, Ole- >> >> The _average_ dropped to 68? The _minimum_ allowed is 64. > > Yes, and this is the cause for some concern from my part. This number > includes reads with no length (because merTrim does not remove the > reads, just record them with 0 length sequence and quality), but I'm > not sure about the average length of not deleted reads. > >> >> In the merTrim stderr output there should be mention of the thresholds it is >> using. There are two thresholds: >> >> 'minVerified' tells what kmers can be used for correcting some other kmer. >> By default, this is 1/4 the guessed coverage in the reads. >> >> 'minCorrect' tells what kmers can be corrected. Any kmer with count at most >> this can be corrected. By default, this is 1/3 the guessed coverage in the >> reads. >> >> After all corrections are done, read ends are trimmed if they are not >> covered by 'trusted' kmers. >> >> Possibly the guessed coverage was artificially high, resulting in >> artificially high thresholds. You can set these thresholds manually with >> -correct (for minCorrect) and -evidence (for minVerified). If the values >> are less than 1, they are interpreted as a fraction of the guessed coverage, >> otherwise, an absolute count threshold. >> >> Does the guessed coverage make sense? Does the kmer count histogram look >> sane? I reran with logging now, and the guessed coverage look insane: Guessed X coverage is 183 Use minCorrect=61 minVerified=45 I think the coverage should be around 16x, so I'll set -correct to 5 and -evidence to 4. Hopefully that should do it. Thank you. Ole > > I forgot to redirect the stderr to a file, but are running it again > now to check the output. > >> >> You can turn on verbose mode, which dumps a picture of the corrections, >> trusted kmer coverage with -V. You probably don't want to do this for all >> reads. Maybe just a sample of 100 reads or so. Super verbose mode (-V -V >> -V) will dump the same picture after each step in the algorithm. > > I think the issue, at least with this library, is that the second read > is really bad. Almost every second read has more than half it length > in quality '#', which is just trash. So this is probably not a cause > where merTrim does something wrong, but where the sequencing has gone > wrong. > > Thank you. > > Ole > >> >> b >> >> >> On 9/6/12 3:08 PM, "Ole Kristian Tørresen" <o.k...@bi...> wrote: >> >>> Hi, >>> I just ran merTrim on a relatively low coverage library, well, we >>> don't really know whether it is low coverage or not since we don't >>> know the genome size accurately yet. The original library was 16 Gbp, >>> but after merTrim and then loading it into an assembly, only 6 Gbp >>> survived. This might give a relatively good assembly, but I'm a bit >>> worried that it removed too much sequence. Can I adjust how much it >>> throws out? >>> >>> My reads are 150 bp, PE. I followed the preprocessing page on the CA >>> site, and created a database of trusted kmers and used that to >>> correct my reads. >>> >>> Of 56,188,107 reads of mate 1, 18,030,094 were deleted and 36,122,767 >>> were clean, and the average length dropped to 68 bp. I expected it to >>> remove about 10 % of my sequences (from what I've seen on other >>> merTrim runs), but 2/3 seems a bit much. >>> >>> Thank you. >>> >>> Ole >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> wgs-assembler-users mailing list >>> wgs...@li... >>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> |