From: Alec W. <al...@br...> - 2010-02-25 01:29:43
|
Hi Klaudia, Although the SAM spec does allow the info about which end is which to be omitted, we (Picard development team) feel that this is bad practice and we don't want to encourage it. Can you write a perl or python script to arbitrarily assign ends to be first or second of pair? -Alec Klaudia Walter wrote: > Hi all, > > I found the following flags paired up 17 with 33 and 19 with 35, which > do not contain the information whether they are the first or the > second mate, if I understand that correctly. > > 1st Example: > > SRR003669.14280418 17 1 1024770 78 51M = > 1025080 259 > TTTGGTCTGTTGTTCTAAGAATCGGAGAGAGAGAGGTTAAAATCTCCGACT > :;7::;9<98=:=0:=@A>=>26:9B=A<B?A5B:<?;A???:=994;6:C > RG:Z:SRR003669 MF:i:4 Aq:i:53 NM:i:3 UQ:i:72 H0:i:1 > H1:i:0 > > SRR003669.14280418 33 1 1025080 53 51M = > 1024770 -259 > TGGTCTATTGTTCTAAGAATCGGAGAGAGAGAGGTTAAAATCTCCAACTAT > C99==@=??8>@;A;?=9@><6=1=9>;=<<8>=9@40A9A8>@6:>9?48 > RG:Z:SRR003669 MF:i:4 Aq:i:53 NM:i:1 UQ:i:29 H0:i:0 > H1:i:1 > > > 2nd Example: > > SRR003667.10102516 35 1 1102933 99 51M = > 1103095 213 > GTCAGTACTTTAGAGGATCCCCTTCCCCAGCAGGAATCCTGGGTGCTGAGG > 3';-;+<35,;==;./<@8/;542864:901>*/5736))4*-)).*-/2) > RG:Z:SRR003667 MF:i:18 Aq:i:57 NM:i:0 UQ:i:0 H0:i:1 > H1:i:0 > > SRR003667.10102516 19 1 1103095 99 51M = > 1102933 -213 > GGGGAGGGGTCTCAGGGCTCCTGACTTCTTCCATTCTTGCCCAGCCCACCC > 40*595,+4><;<>69,?96=@0;<@=<><;<A>:><><<;>A;::<;96@ > RG:Z:SRR003667 MF:i:18 Aq:i:57 NM:i:0 UQ:i:0 H0:i:1 > H1:i:0 > > I am not sure in which circumstances these flags are set. As a > solution for the SamToFastq tool, could not the mate with the smaller > chromosomal position be allocated as the first mate and the other one > as the second mate? > > Thanks, > Klaudia > > > On 23 Feb 2010, at 09:57, John Marshall wrote: > >> On 22 Feb 2010, at 22:56, Alec Wysoker wrote: >>> It looks like there is something strange with your input SAM file. It >>> appears that it does actually contain paired reads, but there are two >>> reads with the same name that are either both marked as being the first >>> of the pair or both marked as being the second of the pair. >> >> I wonder whether Klaudia's input file contains non-primary >> alignments. I guess tools like SamToFastq need to allow for reads >> appearing in more than one SAM alignment record -- hopefully this >> could be as simple as ignoring non-primary records, though the hint >> about split hits in the flag field description suggests that might >> not be quite good enough. >> >> John > |