From: Heng Li <lh...@sa...> - 2009-12-14 16:07:38
|
In the probabilistic framework, the first measurement is the probability of data given the alignment, while the second is the posterior of the alignment given data. Mapping quality belongs to the second category, and the AS/NM/UQ/PQ tags to the first category. In read mapping, we are mostly interested in the posterior and only occasionally look at NM/UQ tags. That is why mapping quality is a mandatory field while others are tags. I entirely agree that mapping uniqueness is not clearly defined in general cases especially for long reads; the only way to clearly define uniqueness is to consider mapping quality. In SAM, we encourage aligners to compute mapping quality. See also this link on FAQ: http://sourceforge.net/apps/mediawiki/samtools/index.php?title=SAM_FAQ#Why_mapping_quality.3F Heng On Mon, Dec 14, 2009 at 09:18:38AM -0500, Alec Wysoker wrote: > Hi Folks, > > Just to clarify the direction I think this discussion is taking... > > I find the term "unique" misleading, since virtually any alignment > could be non-unique if the alignment stringency were loose enough. > Rather, it sounds like there are two somewhat independent measures > of alignment goodness that people want: 1) how well a read matches > the best alignment; and 2) the relative goodness of the best > alignment to the next best alignment. Do we need to store both > these scores in SAM2? > > -Alec > > Benjamin Berman wrote: > >It's true that it might be a bit aligner-specific, but that does not negate it's primary use case which would be to determine simply whether or not *any* other strong matches exist. You could try to give it a more concrete interpretation (any matches with a 5% or more probability of being correct, any matches within an edit distance of N from the best match), but it would be difficult or more likely impossible to get aligners to comply with this. In practice, any aligner capable of returning multiple hits has to have some cutoff, and this would be the same cutoff used for the proposed "numHits" field. > > > >ben. > > > > > >On Dec 11, 2009, at 2:03 PM, Paul Anderson wrote: > > > >>On Fri, Dec 11, 2009 at 3:13 PM, David Rio <dri...@gm...> wrote: > >> > >>>I would like to suggest an extra tag in the SAM spec to differentiate > >>>uniquely mapped reads > >>>from "best score" reads. What I would suggest is adding an extra tag > >>>that is either 0 for > >>>uniquely mapped reads or # where # is the number of other hits for that read. > >>> > >>>What do you guys think? > >>I think it is misleading, since depending on the aligner, or the > >>settings of the aligner, you are going to get different answers. > >> > >>In many index based aligners, (e.g. KARMA), if you use a smaller > >>index, you will tend to get more possible matches. > >> > >>That said, KARMA will write the number of locations it evaluated for a > >>single ended read, but like Goncalo suggested, I think the quality > >>score is really what you're looking for. > >> > >>The mapping score, if aligners are doing a good job computing it, > >>should be moderately comparable across aligners, since it has a > >>mathematical basis in reality. > >> > >>Paul > >> > >>------------------------------------------------------------------------------ > >>Return on Information: > >>Google Enterprise Search pays you back > >>Get the facts. > >>http://p.sf.net/sfu/google-dev2dev > >>_______________________________________________ > >>Samtools-devel mailing list > >>Sam...@li... > >>https://lists.sourceforge.net/lists/listinfo/samtools-devel > > > > > >------------------------------------------------------------------------------ > >Return on Information: > >Google Enterprise Search pays you back > >Get the facts. > >http://p.sf.net/sfu/google-dev2dev > >_______________________________________________ > >Samtools-devel mailing list > >Sam...@li... > >https://lists.sourceforge.net/lists/listinfo/samtools-devel > ------------------------------------------------------------------------------ > Return on Information: > Google Enterprise Search pays you back > Get the facts. > http://p.sf.net/sfu/google-dev2dev > _______________________________________________ > Samtools-devel mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-devel -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |