From: Anil K. <ani...@gs...> - 2021-09-17 15:50:05
|
Hello everyone, I have a situation where samtools flagstats for a BAM file which is already marked with duplicate with Picard produces the following: 253552402 + 0 in total (QC-passed reads + QC-failed reads) 132897348 + 0 secondary 0 + 0 supplementary 71809672 + 0 duplicates 247864536 + 0 mapped (97.76% : N/A) 120655054 + 0 paired in sequencing 60327527 + 0 read1 60327527 + 0 read2 114967188 + 0 properly paired (95.29% : N/A) 114967188 + 0 with itself and mate mapped 0 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5) To determine PCR duplication rate from the above values, I have two options PCR duplication = 4th row / 1st row = 71809672 / 253552402 = 0.28 PCR duplication = 4th row / 9th row = 71809672 / 114967188 = 0.62 2nd calculation produces the duplication rate very close to what is reported in Picard's report *.est_lib_complex_metrics.txt. Makes sense to me! However, I wanted to understand if the first calculation has any meaning or its entirely wrong way of determining PCR duplications. Please advise me. Thanks! Anil GSK monitors email communications sent to and from GSK in order to protect GSK, our employees, customers, suppliers and business partners, from cyber threats and loss of GSK Information. GSK monitoring is conducted with appropriate confidentiality controls and in accordance with local laws and after appropriate consultation. |