Menu

SNV QUAL calculation

2014-04-17
2018-05-04
  • jessica preston

    jessica preston - 2014-04-17

    Hello,

    Do you mind explaining how QUAL is calculated by LoFreq? I am using the new version LoFreq-star.

    Another question, does LoFreq-star have an option to filter snv's by base quality. ie. like --ignore-bases in the previous version?

    Thanks much!!

     
    • Andreas Wilm

      Andreas Wilm - 2014-04-21

      Hi Jessica,

      sorry for the late reply. I was on leave over Easter without internet access.

      LoFreq's SNV qualities are Phred-scaled p-values, which describe how
      likely a reported SNV is a false positive, i.e. not actually a SNV.
      Basically LoFreq models SNVs as coin-tossing experiment, where the
      error probability changes at each coin toss (i.e. bases in a pileup
      column). As sources of errors, it takes base-qualities, mapping
      qualities etc into account. LoFreq will only report SNVs with a
      p-value smaller than 5% (i.e. a quality of 20) after multiple testing
      correction. Please also refer to the LoFreq manuscript
      (http://www.ncbi.nlm.nih.gov/pubmed/23066108) for more details.

      Two 'unusual' values are possible:

      • Dot: LoFreq has the notion of consensus variants, which are
        positions where more than 50% of bases differ from the reference. In
        such cases LoFreq cannot calculate a probability using its model,
        which is why the corresponding quality is set to 'not available' and
        that corresponds to the dot character in vcf format.

      • 2147483647: This corresponds to a p-value close to zero, i.e. a
        highly significant SNV. The reason is this: to prevent taking the log
        of zero, older versions of LoFreq (<version 2.0 RC1) set the Phred
        score to the maximum integer (2147483647) if the corresponding p-value
        was almost zero (<DBL_EPSILON).

      Regarding your filtering question: yes, LoFreq version >2 can also
      ignore bases below a certain base quality threshold. Have a look at
      the 'Base-call quality' section in the usage generated by simply
      calling 'lofreq call'. The easist would be to use:

      -q | --min-bq INT Skip any base with baseQ smaller than INT [6]

      Andreas

      On 18 April 2014 04:33, jessica preston jpreston555@users.sf.net wrote:

      Hello,

      Do you mind explaining how QUAL is calculated by LoFreq? I am using the new
      version LoFreq-star.

      Another question, does LoFreq-star have an option to filter snv's by base
      quality. ie. like --ignore-bases in the previous version?

      Thanks much!!


      SNV QUAL calculation


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/lofreq/discussion/general/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Andreas Wilm
      andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC

       
      • Andreas Wilm

        Andreas Wilm - 2014-04-21

        Small correction: I incorrectly said "LoFreq will only report SNVs
        with a p-value smaller than 5% (i.e. a quality of 20)". However, the
        Phred-value corresponding to 5% is 13, not 20.

        Andreas

        On 21 April 2014 11:13, Andreas Wilm onde@users.sf.net wrote:

        Hi Jessica,

        sorry for the late reply. I was on leave over Easter without internet
        access.

        LoFreq's SNV qualities are Phred-scaled p-values, which describe how
        likely a reported SNV is a false positive, i.e. not actually a SNV.
        Basically LoFreq models SNVs as coin-tossing experiment, where the
        error probability changes at each coin toss (i.e. bases in a pileup
        column). As sources of errors, it takes base-qualities, mapping
        qualities etc into account. LoFreq will only report SNVs with a
        p-value smaller than 5% (i.e. a quality of 20) after multiple testing
        correction. Please also refer to the LoFreq manuscript
        (http://www.ncbi.nlm.nih.gov/pubmed/23066108) for more details.

        Two 'unusual' values are possible:

        Dot: LoFreq has the notion of consensus variants, which are

        positions where more than 50% of bases differ from the reference. In
        such cases LoFreq cannot calculate a probability using its model,
        which is why the corresponding quality is set to 'not available' and
        that corresponds to the dot character in vcf format.

        2147483647: This corresponds to a p-value close to zero, i.e. a

        highly significant SNV. The reason is this: to prevent taking the log
        of zero, older versions of LoFreq (<version 2.0 RC1) set the Phred
        score to the maximum integer (2147483647) if the corresponding p-value
        was almost zero (<DBL_EPSILON).

        Regarding your filtering question: yes, LoFreq version >2 can also
        ignore bases below a certain base quality threshold. Have a look at
        the 'Base-call quality' section in the usage generated by simply
        calling 'lofreq call'. The easist would be to use:

        -q | --min-bq INT Skip any base with baseQ smaller than INT [6]

        Andreas

        On 18 April 2014 04:33, jessica preston jpreston555@users.sf.net wrote:

        Hello,

        Do you mind explaining how QUAL is calculated by LoFreq? I am using the new
        version LoFreq-star.

        Another question, does LoFreq-star have an option to filter snv's by base
        quality. ie. like --ignore-bases in the previous version?

        Thanks much!!


        SNV QUAL calculation


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/lofreq/discussion/general/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

        --

        Andreas Wilm
        andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC


        SNV QUAL calculation


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/lofreq/discussion/general/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

        --
        Andreas Wilm
        andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC

         
  • Steve

    Steve - 2018-05-03

    As sources of errors, it takes base-qualities, mapping
    qualities etc into account.

    Thanks for this. However I was wondering if there was a more thorough explanation of each of the values that are used in calculation of the 'QUAL' score values that are output in the VCF? I did not see it covered in the publication (maybe I missed it?) and wasn't able to figure out what was going on in the source code.

     

    Last edit: Steve 2018-05-03
    • Andreas Wilm

      Andreas Wilm - 2018-05-04

      Hi Steve,

      sure. The basics are explained in the NAR paper (Wilm, 2012): We compute a
      poisson-binomial distribution taking error probabilities at each pileup
      site into consideration and derive a p-value from that. Error probabilities
      were originally just converted base qualities (because that's what they
      are). In later LoFreq versions we merged base alignment, mapping and base
      quality into one error probability per base. The logic goes like this:
      either the read is misaligned (mapping quality) or if not, the base might
      be misaligned, or if neither of that is true then the base itself might be
      wrong, i.e.
      P_m + (1-P_m)P_a + (1-P_m)(1-P_a)*P_b,
      where P_m is the mapping error probability
      P_a is the base alignment error probability (BAQ) and
      P_b is the base error probability

      Hope this makes sense,
      Andreas

      On 4 May 2018 at 03:32, Steve stevekm@users.sourceforge.net wrote:

      As sources of errors, it takes base-qualities, mapping
      qualities etc into account.

      Thanks for this. However I was wondering if there was a more thorough
      explanation of each of the values that are used in calculation of the
      'QUAL' score values that are output in the VCF?


      SNV QUAL calculation
      https://sourceforge.net/p/lofreq/discussion/general/thread/7b713493/?limit=25#3495


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/lofreq/discussion/general/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Andreas Wilm
      andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC

       

Log in to post a comment.