From: John M. <jm...@sa...> - 2012-04-26 12:57:07
|
On 25 Apr 2012, at 13:05, Peter Cock wrote: > I recently diagnosed a problem in samtools view when converting from > BAM to SAM where a read had unusually high quality scores: > http://seqanswers.com/forums/showthread.php?t=19470 We've recently encountered some SOLiD BAM files with similarly unrestricted base qualities, which caused various minor pain with various tools. > Should the SAM/BAM specification be clarified to make it explicit > that the PHRED base quality scores are also restricted to 0 to 93 > inclusive? It used to be explicit in the old specification; e.g. 0.1.2-draft (20090820) had QUAL [!-~]+|\* [0,93] query QUALity; ASCII-33 gives the Phred base quality The current one is a little less obvious (and discusses the possibility of * separately): QUAL String [!-~]+ ASCII of Phred-scaled base QUALity+33 In both cases, this text is in the Alignment section, but it's pretty obvious that the BAM Format section has to be read with respect to that section. Nonetheless, the only reason not to clarify this would be if doing so constituted tightening up and rejecting things that were previously valid, rather than merely clarifying. Since it used to be more explicit and is currently implicitly stated, that reason surely doesn't apply. > Also, I think this range should be checked in samtools to avoid > (as currently the case) producing non-printable characters or > otherwise invalid SAM output by ignoring out of range scores > (as in the thread on SEQanswers). That I'm less sure about. What exactly do you propose? Clamping out-of-range base qualities at 93 / ~? With or without a warning? With a million warnings, one for each out-of-range base quality encountered? John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |