|
From: Johan L. <joh...@ki...> - 2013-02-07 12:45:29
|
Dear All, I have done exome capture on a DNA source being ~170 bp in average. Since its sequenced as paired end-data (100x2), most of the pairs will harbor overlapping sequence. I have been using a tool called seqprep to merge the fastq-files. Its available here: https://github.com/jstjohn/SeqPrep This creats single-end data from most of my reads. The problem is that doing: fastq -> seqprep, single end (for overlapping reads) + paired end (for non-overlapping reads) -> map using BWA -> markduplicates will give ~35% lower coverage relatively doing fastq -> map using BWA -> markduplicates -> back to fastq -> seqprep, single end (for overlapping reads) + paired end (for non-overlapping reads) -> map using BWA I guess this could be fixed if markduplicates would look at the length of each single-end read and use that to decide if single-end reads are duplicates or not. Is this something thats about to get implemented in Picard or can be easily altered in the code? Or have I missed a solution perhaps already in place? any comments or suggestions as greatly appreciated! best regards, // Johan Lindberg ***************************************** Johan Lindberg, PhD Department of Medical Epidemiology and Biostatistics Nobels Väg 12A, PO.Box 281 17177 Solna, Sweden ***************************************** |