Re: [Samtools-help] Picard error - mark duplicates

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Alec  -

Will setting a separate read group for each lane successfully 
distinguish reads with identical read names, or does Natasja 
need to prefix each read name with something ?  I don't know 
how Picard behaves.

 	 	 	 	 	-  thanks  -  tom blackwell  -

On Tue, 15 May 2012, Natasja Spring Ehlers wrote:

> Hi Tom,
>
> Data for one sample is generated across three lanes (aligned individually and then merged). I just checked my input files for that sample, and the IDs are found identical for two different reads from two different lanes. I have never experienced such problems with Illumina reads. This solid data is truly giving me a headache.
>
> Thank you very much for your fast reply!
>
>
> Best,
> Natasja
>
>
>
> On May 15, 2012, at 2:55 PM, Tom Blackwell wrote:
>
>> Natasja  -
>>
>> This data set appears to contain two different read pairs with the same read identifier, 222_523_1850_F.  Both pairs have a 50 bp read for the first end and a 35 bp read for the second end -- but the two 50 bp reads are not the same and the two 35 bp reads are not the same.  Look at the "CS:Z:" optional tags to see the original color space reads themselves.  The two first end reads are different and the two second end reads are also different.  Could this file contain sequence reads merged from multiple instrument runs ?  If so, one could assign read group identifiers to distinguish the reads from different sequencing instrument runs.  I'm not familiar with Novoalign, so can't tell you how to do this.
>>
>> 	 	 	 	 	 	-  tom blackwell  -
>>
>> On Tue, 15 May 2012, Natasja Spring Ehlers wrote:
>>
>>>
>>> Hi all,
>>>
>>> I have aligned paired end color space reads with novoalignCS. Now I want to mark duplicate reads using picard. It works well for some files, but for other files I get the error message:
>>>
>>>
>>> FAQ:  http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page
>>> Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once.  9: B12401:222_523_1850_F
>>>      at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
>>>      at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
>>>      at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
>>>      at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:343)
>>>      at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:122)
>>>      at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
>>>      at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:106)
>>>
>>>
>>> I have grepped the specified read in the corresponding input bam file, and I still do not see what the problem is here:
>>>
>>> 222_523_1850_F	97	2	165311663	70	50M	10	104666196	0	ACCCCGAAAANNAACACAGTCTTCTCACAGATCAGGCCCTTAATTAATGG	QQ---,^___#"QUQ@NQQ\WWRUZ(%KGNNOYMPMFOINRU]]N@@@G7	PG:Z:novoalignCSMPI	AS:i:120	UQ:i:120	NM:i:3	MD:Z:5T4G0G38	CS:Z:T31030120002120111121220222011223212030020303030310	CQ:Z:!@2@/@/?@@@@=<6@2/@2@=;=6@;/662=2><2?/882=6@>@/2/26	CM:i:4
>>> 222_523_1850_F	113	5	25554230	70	50M	10	62880846	0	GGCAAAGATGGTCAATGGGGNNTTACTTCTCCTAGACACATGTGAAGAGA	0M^--QQZZ_NL]_^MN_WV##ZY\--[[_**[[_NN_]]*)^ZPU__ZX	PG:Z:novoalignCSMPI	AS:i:113	UQ:i:113	NM:i:2	MD:Z:20A0A28	CS:Z:T02222021123111122321222011303220001301210132203130	CQ:Z:!>;@@@6;@?2@>@@/@@<@2@@<@/@==>;?8@@/?@@>/@@;@2@/@?/	CM:i:5
>>> 222_523_1850_F	177	10	62880846	68	35M	5	25554230	0	GCGGCCTAGTGTTTCANTGGAAGACAATACATGAG	0=@QQQYQV^[QPZTD"$Y^--NNZZ^^__\ROW]	PG:Z:novoalignCSMPI	AS:i:45	UQ:i:45	NM:i:1	MD:Z:16A18	CS:Z:T22213113301122120120120011123203033	CQ:Z:!@>:6=@@@?@;@/@/@?;605@;6<@?8:@2@2//	CM:i:2
>>> 222_523_1850_F	145	10	104666196	42	31M4S	2	165311663	0	ATCCCCACCTCAAGTCAACTTCACTCANNCCGNNA	=[_______^PMSV^ZZ_______WF=""@==""@	PG:Z:novoalignCSMPI	AS:i:71	UQ:i:71	NM:i:2	MD:Z:27C0C2	CS:Z:T03303002112211202101212012201100023	CQ:Z:!2/////22//8@@@@@@@@;@?8<2?@@@@@@@@<	CM:i:2
>>>
>>>
>>> Any help would be most appreciated!
>>>
>>> Best,
>>> Natasja
>
>