From: Tom B. <tb...@um...> - 2012-05-15 14:00:25
|
Alec - Will setting a separate read group for each lane successfully distinguish reads with identical read names, or does Natasja need to prefix each read name with something ? I don't know how Picard behaves. - thanks - tom blackwell - On Tue, 15 May 2012, Natasja Spring Ehlers wrote: > Hi Tom, > > Data for one sample is generated across three lanes (aligned individually and then merged). I just checked my input files for that sample, and the IDs are found identical for two different reads from two different lanes. I have never experienced such problems with Illumina reads. This solid data is truly giving me a headache. > > Thank you very much for your fast reply! > > > Best, > Natasja > > > > On May 15, 2012, at 2:55 PM, Tom Blackwell wrote: > >> Natasja - >> >> This data set appears to contain two different read pairs with the same read identifier, 222_523_1850_F. Both pairs have a 50 bp read for the first end and a 35 bp read for the second end -- but the two 50 bp reads are not the same and the two 35 bp reads are not the same. Look at the "CS:Z:" optional tags to see the original color space reads themselves. The two first end reads are different and the two second end reads are also different. Could this file contain sequence reads merged from multiple instrument runs ? If so, one could assign read group identifiers to distinguish the reads from different sequencing instrument runs. I'm not familiar with Novoalign, so can't tell you how to do this. >> >> - tom blackwell - >> >> On Tue, 15 May 2012, Natasja Spring Ehlers wrote: >> >>> >>> Hi all, >>> >>> I have aligned paired end color space reads with novoalignCS. Now I want to mark duplicate reads using picard. It works well for some files, but for other files I get the error message: >>> >>> >>> FAQ: http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page >>> Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 9: B12401:222_523_1850_F >>> at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124) >>> at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78) >>> at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61) >>> at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:343) >>> at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:122) >>> at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177) >>> at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:106) >>> >>> >>> I have grepped the specified read in the corresponding input bam file, and I still do not see what the problem is here: >>> >>> 222_523_1850_F 97 2 165311663 70 50M 10 104666196 0 ACCCCGAAAANNAACACAGTCTTCTCACAGATCAGGCCCTTAATTAATGG QQ---,^___#"QUQ@NQQ\WWRUZ(%KGNNOYMPMFOINRU]]N@@@G7 PG:Z:novoalignCSMPI AS:i:120 UQ:i:120 NM:i:3 MD:Z:5T4G0G38 CS:Z:T31030120002120111121220222011223212030020303030310 CQ:Z:!@2@/@/?@@@@=<6@2/@2@=;=6@;/662=2><2?/882=6@>@/2/26 CM:i:4 >>> 222_523_1850_F 113 5 25554230 70 50M 10 62880846 0 GGCAAAGATGGTCAATGGGGNNTTACTTCTCCTAGACACATGTGAAGAGA 0M^--QQZZ_NL]_^MN_WV##ZY\--[[_**[[_NN_]]*)^ZPU__ZX PG:Z:novoalignCSMPI AS:i:113 UQ:i:113 NM:i:2 MD:Z:20A0A28 CS:Z:T02222021123111122321222011303220001301210132203130 CQ:Z:!>;@@@6;@?2@>@@/@@<@2@@<@/@==>;?8@@/?@@>/@@;@2@/@?/ CM:i:5 >>> 222_523_1850_F 177 10 62880846 68 35M 5 25554230 0 GCGGCCTAGTGTTTCANTGGAAGACAATACATGAG 0=@QQQYQV^[QPZTD"$Y^--NNZZ^^__\ROW] PG:Z:novoalignCSMPI AS:i:45 UQ:i:45 NM:i:1 MD:Z:16A18 CS:Z:T22213113301122120120120011123203033 CQ:Z:!@>:6=@@@?@;@/@/@?;605@;6<@?8:@2@2// CM:i:2 >>> 222_523_1850_F 145 10 104666196 42 31M4S 2 165311663 0 ATCCCCACCTCAAGTCAACTTCACTCANNCCGNNA =[_______^PMSV^ZZ_______WF=""@==""@ PG:Z:novoalignCSMPI AS:i:71 UQ:i:71 NM:i:2 MD:Z:27C0C2 CS:Z:T03303002112211202101212012201100023 CQ:Z:!2/////22//8@@@@@@@@;@?8<2?@@@@@@@@< CM:i:2 >>> >>> >>> Any help would be most appreciated! >>> >>> Best, >>> Natasja > > |