Re: [wgs-assembler-users] merTrim aggressiveness

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 10 September 2012 09:31, Walenz, Brian <bw...@jc...> wrote:
> Hi, Ole-
>
> The _average_ dropped to 68?  The _minimum_ allowed is 64.

Yes, and this is the cause for some concern from my part. This number
includes reads with no length (because merTrim does not remove the
reads, just record them with 0 length sequence and quality), but I'm
not sure about the average length of not deleted reads.

>
> In the merTrim stderr output there should be mention of the thresholds it is
> using.  There are two thresholds:
>
> 'minVerified' tells what kmers can be used for correcting some other kmer.
> By default, this is 1/4 the guessed coverage in the reads.
>
> 'minCorrect' tells what kmers can be corrected.  Any kmer with count at most
> this can be corrected.  By default, this is 1/3 the guessed coverage in the
> reads.
>
> After all corrections are done, read ends are trimmed if they are not
> covered by 'trusted' kmers.
>
> Possibly the guessed coverage was artificially high, resulting in
> artificially high thresholds.  You can set these thresholds manually with
> -correct (for minCorrect) and -evidence (for minVerified).  If the values
> are less than 1, they are interpreted as a fraction of the guessed coverage,
> otherwise, an absolute count threshold.
>
> Does the guessed coverage make sense?  Does the kmer count histogram look
> sane?

I forgot to redirect the stderr to a file, but are running it again
now to check the output.

>
> You can turn on verbose mode, which dumps a picture of the corrections,
> trusted kmer coverage with -V.  You probably don't want to do this for all
> reads.  Maybe just a sample of 100 reads or so.  Super verbose mode (-V -V
> -V) will dump the same picture after each step in the algorithm.

I think the issue, at least with this library, is that the second read
is really bad. Almost every second read has more than half it length
in quality '#', which is just trash. So this is probably not a cause
where merTrim does something wrong, but where the sequencing has gone
wrong.

Thank you.

Ole

>
> b
>
>
> On 9/6/12 3:08 PM, "Ole Kristian Tørresen" <o.k...@bi...> wrote:
>
>> Hi,
>> I just ran merTrim on a relatively low coverage library, well, we
>> don't really know whether it is low coverage or not since we don't
>> know the genome size accurately yet. The original library was 16 Gbp,
>> but after merTrim and then loading it into an assembly, only 6 Gbp
>> survived. This might give a relatively good assembly, but I'm a bit
>> worried that it removed too much sequence. Can I adjust how much it
>> throws out?
>>
>> My reads are 150 bp, PE. I followed the preprocessing page on the CA
>> site, and created a database of trusted kmers and used that  to
>> correct my reads.
>>
>> Of 56,188,107 reads of mate 1, 18,030,094 were deleted  and 36,122,767
>> were clean, and the average length dropped to 68 bp. I expected it to
>> remove about 10 % of my sequences (from what I've seen on other
>> merTrim runs), but 2/3 seems a bit much.
>>
>> Thank you.
>>
>> Ole
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> wgs-assembler-users mailing list
>> wgs...@li...
>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>