Re: [Samtools-help] MarkDuplicates - detection/elimination of duplicates in long reads (Roche 454 r

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Oscar,

The algorithm that MarkDuplicates uses for non-paired-end reads is 
pretty crude, and quite likely will result in reads being considered 
duplicates that are not in fact dupes.  I don't think it has anything to 
do with the size of 454 reads.  I suspect MarkDuplicates is not the 
right tool for this job.  You probably need a duplicate detection 
algorithm that handles single-end reads more intelligently.

-Alec

On 5/27/11 12:13 PM, Oscar Rodríguez wrote:
>
> Dear All,
>
> We are trying to detect indels and SNPs from Roche 454 reads, but 
> unfortunately without getting good results in the final detection of 
> indels and SNPs, due to the large size of the 454 reads (about 400 
> bps) in the detection/elimination of duplicates (this process is being 
> carried out with the picard tool "MarkDuplicates"). As a consequence 
> of the large size of the reads, very few information (reads) is being 
> kept in the indels/SNPs calling process, and for this reason very few 
> variants are being detected. Do you have any advice on how should we 
> proceed in the detection/elimination of duplicates in the Roche 454 
> reads? Should we try another tool? Or is there any configuration 
> parameter for "MarkDuplicates" that we should consider?
>
> Thank you very much in advance.
>
> Kind regards,
>
> Oscar
>
>

Re: [Samtools-help] MarkDuplicates - detection/elimination of duplicates in long reads (Roche 454 r

Re: [Samtools-help] MarkDuplicates - detection/elimination of duplicates in long reads (Roche 454 reads)