Re: [wgs-assembler-users] merTrim aggressiveness

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 10 September 2012 12:31, Ole Kristian Tørresen
<o.k...@bi...> wrote:
> On 10 September 2012 09:31, Walenz, Brian <bw...@jc...> wrote:
>> Hi, Ole-
>>
>> The _average_ dropped to 68?  The _minimum_ allowed is 64.
>
> Yes, and this is the cause for some concern from my part. This number
> includes reads with no length (because merTrim does not remove the
> reads, just record them with 0 length sequence and quality), but I'm
> not sure about the average length of not deleted reads.
>
>>
>> In the merTrim stderr output there should be mention of the thresholds it is
>> using.  There are two thresholds:
>>
>> 'minVerified' tells what kmers can be used for correcting some other kmer.
>> By default, this is 1/4 the guessed coverage in the reads.
>>
>> 'minCorrect' tells what kmers can be corrected.  Any kmer with count at most
>> this can be corrected.  By default, this is 1/3 the guessed coverage in the
>> reads.
>>
>> After all corrections are done, read ends are trimmed if they are not
>> covered by 'trusted' kmers.
>>
>> Possibly the guessed coverage was artificially high, resulting in
>> artificially high thresholds.  You can set these thresholds manually with
>> -correct (for minCorrect) and -evidence (for minVerified).  If the values
>> are less than 1, they are interpreted as a fraction of the guessed coverage,
>> otherwise, an absolute count threshold.
>>
>> Does the guessed coverage make sense?  Does the kmer count histogram look
>> sane?

I reran with logging now, and the guessed coverage look insane:
Guessed X coverage is 183
Use minCorrect=61 minVerified=45

I think the coverage should be around 16x, so I'll set -correct to 5
and -evidence to 4. Hopefully that should do it.

Thank you.

Ole

>
> I forgot to redirect the stderr to a file, but are running it again
> now to check the output.
>
>>
>> You can turn on verbose mode, which dumps a picture of the corrections,
>> trusted kmer coverage with -V.  You probably don't want to do this for all
>> reads.  Maybe just a sample of 100 reads or so.  Super verbose mode (-V -V
>> -V) will dump the same picture after each step in the algorithm.
>
> I think the issue, at least with this library, is that the second read
> is really bad. Almost every second read has more than half it length
> in quality '#', which is just trash. So this is probably not a cause
> where merTrim does something wrong, but where the sequencing has gone
> wrong.
>
> Thank you.
>
> Ole
>
>>
>> b
>>
>>
>> On 9/6/12 3:08 PM, "Ole Kristian Tørresen" <o.k...@bi...> wrote:
>>
>>> Hi,
>>> I just ran merTrim on a relatively low coverage library, well, we
>>> don't really know whether it is low coverage or not since we don't
>>> know the genome size accurately yet. The original library was 16 Gbp,
>>> but after merTrim and then loading it into an assembly, only 6 Gbp
>>> survived. This might give a relatively good assembly, but I'm a bit
>>> worried that it removed too much sequence. Can I adjust how much it
>>> throws out?
>>>
>>> My reads are 150 bp, PE. I followed the preprocessing page on the CA
>>> site, and created a database of trusted kmers and used that  to
>>> correct my reads.
>>>
>>> Of 56,188,107 reads of mate 1, 18,030,094 were deleted  and 36,122,767
>>> were clean, and the average length dropped to 68 bp. I expected it to
>>> remove about 10 % of my sequences (from what I've seen on other
>>> merTrim runs), but 2/3 seems a bit much.
>>>
>>> Thank you.
>>>
>>> Ole
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> wgs-assembler-users mailing list
>>> wgs...@li...
>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>>