From: Alec W. <al...@br...> - 2011-05-31 18:23:44
|
Hi Oscar, The algorithm that MarkDuplicates uses for non-paired-end reads is pretty crude, and quite likely will result in reads being considered duplicates that are not in fact dupes. I don't think it has anything to do with the size of 454 reads. I suspect MarkDuplicates is not the right tool for this job. You probably need a duplicate detection algorithm that handles single-end reads more intelligently. -Alec On 5/27/11 12:13 PM, Oscar Rodríguez wrote: > > Dear All, > > We are trying to detect indels and SNPs from Roche 454 reads, but > unfortunately without getting good results in the final detection of > indels and SNPs, due to the large size of the 454 reads (about 400 > bps) in the detection/elimination of duplicates (this process is being > carried out with the picard tool "MarkDuplicates"). As a consequence > of the large size of the reads, very few information (reads) is being > kept in the indels/SNPs calling process, and for this reason very few > variants are being detected. Do you have any advice on how should we > proceed in the detection/elimination of duplicates in the Roche 454 > reads? Should we try another tool? Or is there any configuration > parameter for "MarkDuplicates" that we should consider? > > Thank you very much in advance. > > Kind regards, > > Oscar > > |