From: Peter C. <p.j...@go...> - 2013-02-22 10:27:16
|
On Fri, Feb 22, 2013 at 9:32 AM, James Bonfield <jk...@sa...> wrote: > On Thu, Feb 21, 2013 at 06:02:44PM +0000, Peter Cock wrote: >> Why is there a problem in the spec? >> >> If you have an unmapped read, it cannot be marked as reverse >> complemented (under the current SAM/BAM FLAG definition), >> therefore it must be stored in the original direction. > > Ah but is this always the case? When typical real life usage and a specification disagree... trouble. > BWA concatenates all references together first and then does the > alignments. Sometimes reads align across the boundary of one ref and > the next. It can reverse complement them at this point, setting > mapping quality and generating CIGAR strings. Later on the algorithm > notices this and marks the read as unmapped, but leaving all other > fields intact. > > This was pointed out a long while ago and the response was IIRC to > modify the spec to simply state that all those fields are undefined > when a read is marked as unmapped. > > It's trivially to verify that BWA really does this by looking at > quality strings: > > samtools view /tmp/6714_6#1.bam| perl -lane 'print $F[10] if (($F[1]&0x04) && ($F[1]&0x10))' > > vs > > samtools view /tmp/6714_6#1.bam| perl -lane 'print $F[10] if (($F[1]&0x04) && !($F[1]&0x10))' > > The profile is clear to see one is complemented and one is not. > > Now maybe this is a bwa bug (it's been reported as such before I > think), but the fact is we do already have 1000s of BAM files in the > wild where we need to use the 0x10 flag when 0x4 is set. So I suggest > we just update the spec to match reality. > I would side with this being a bug - in fact BWA not clearing the other fields like the RNAME is also the cause of much confusion, and worthy of fixing. However you're probably right that the shear number of existing BAM files with this use of 0x10 flag despite 0x4 not being set would favour legitimising this in the specification. On Fri, Feb 22, 2013 at 9:37 AM, James Bonfield <jk...@sa...> wrote: > On Thu, Feb 21, 2013 at 06:02:44PM +0000, Peter Cock wrote: >> If you have an unmapped read, it cannot be marked as reverse >> complemented (under the current SAM/BAM FLAG definition), >> therefore it must be stored in the original direction. > > One more thing I forgot to add. Your statement above is *also* at odds > with the spec. It cannot be marked as reverse complemented, but it > also cannot be marked as the forward strand either. Ie the spec states > explicitly that we cannot tell whether the read is complemented or not. We're using the RC field in two ways - in the context of a mapped read it tells us which strand is used AND if the SEQ/QUAL are reversed. For an unmapped read, the reference strand is irrelevant, but the directionality of the SEQ/QUAL is still important. As you note, the fact that the 0x10 flag has been used for unmapped reads where the SEQ/QUAL fields have been reverse despite 0x4 not being set is an issue. I would think a 'SAM/BAM cleanup' tool could fix this (clear 0x4 and reverse complement the SEQ/QUAL), and perhaps BAM -> CRAM -> BAM can do that? Regards, Peter |