LoFreq / Discussion / General Discussion: SNV QUAL calculation

jessica preston - 2014-04-17

Hello,

Do you mind explaining how QUAL is calculated by LoFreq? I am using the new version LoFreq-star.

Another question, does LoFreq-star have an option to filter snv's by base quality. ie. like --ignore-bases in the previous version?

Thanks much!!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andreas Wilm - 2014-04-21
  
  Hi Jessica,
  
  sorry for the late reply. I was on leave over Easter without internet access.
  
  LoFreq's SNV qualities are Phred-scaled p-values, which describe how
  likely a reported SNV is a false positive, i.e. not actually a SNV.
  Basically LoFreq models SNVs as coin-tossing experiment, where the
  error probability changes at each coin toss (i.e. bases in a pileup
  column). As sources of errors, it takes base-qualities, mapping
  qualities etc into account. LoFreq will only report SNVs with a
  p-value smaller than 5% (i.e. a quality of 20) after multiple testing
  correction. Please also refer to the LoFreq manuscript
  (http://www.ncbi.nlm.nih.gov/pubmed/23066108) for more details.
  
  Two 'unusual' values are possible:
  
  Dot: LoFreq has the notion of consensus variants, which are
  positions where more than 50% of bases differ from the reference. In
  such cases LoFreq cannot calculate a probability using its model,
  which is why the corresponding quality is set to 'not available' and
  that corresponds to the dot character in vcf format.
  
  2147483647: This corresponds to a p-value close to zero, i.e. a
  highly significant SNV. The reason is this: to prevent taking the log
  of zero, older versions of LoFreq (<version 2.0 RC1) set the Phred
  score to the maximum integer (2147483647) if the corresponding p-value
  was almost zero (<DBL_EPSILON).
  
  Regarding your filtering question: yes, LoFreq version >2 can also
  ignore bases below a certain base quality threshold. Have a look at
  the 'Base-call quality' section in the usage generated by simply
  calling 'lofreq call'. The easist would be to use:
  
  -q | --min-bq INT Skip any base with baseQ smaller than INT [6]
  
  Andreas
  
  On 18 April 2014 04:33, jessica preston jpreston555@users.sf.net wrote:
  
  Hello,
  
  Do you mind explaining how QUAL is calculated by LoFreq? I am using the new
  version LoFreq-star.
  
  Another question, does LoFreq-star have an option to filter snv's by base
  quality. ie. like --ignore-bases in the previous version?
  
  Thanks much!!
  
  SNV QUAL calculation
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/lofreq/discussion/general/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Andreas Wilm
  andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Andreas Wilm - 2014-04-21
    
    Small correction: I incorrectly said "LoFreq will only report SNVs
    with a p-value smaller than 5% (i.e. a quality of 20)". However, the
    Phred-value corresponding to 5% is 13, not 20.
    
    Andreas
    
    On 21 April 2014 11:13, Andreas Wilm onde@users.sf.net wrote:
    
    Hi Jessica,
    
    sorry for the late reply. I was on leave over Easter without internet
    access.
    
    LoFreq's SNV qualities are Phred-scaled p-values, which describe how
    likely a reported SNV is a false positive, i.e. not actually a SNV.
    Basically LoFreq models SNVs as coin-tossing experiment, where the
    error probability changes at each coin toss (i.e. bases in a pileup
    column). As sources of errors, it takes base-qualities, mapping
    qualities etc into account. LoFreq will only report SNVs with a
    p-value smaller than 5% (i.e. a quality of 20) after multiple testing
    correction. Please also refer to the LoFreq manuscript
    (http://www.ncbi.nlm.nih.gov/pubmed/23066108) for more details.
    
    Two 'unusual' values are possible:
    
    Dot: LoFreq has the notion of consensus variants, which are
    
    positions where more than 50% of bases differ from the reference. In
    such cases LoFreq cannot calculate a probability using its model,
    which is why the corresponding quality is set to 'not available' and
    that corresponds to the dot character in vcf format.
    
    2147483647: This corresponds to a p-value close to zero, i.e. a
    
    highly significant SNV. The reason is this: to prevent taking the log
    of zero, older versions of LoFreq (<version 2.0 RC1) set the Phred
    score to the maximum integer (2147483647) if the corresponding p-value
    was almost zero (<DBL_EPSILON).
    
    Regarding your filtering question: yes, LoFreq version >2 can also
    ignore bases below a certain base quality threshold. Have a look at
    the 'Base-call quality' section in the usage generated by simply
    calling 'lofreq call'. The easist would be to use:
    
    -q | --min-bq INT Skip any base with baseQ smaller than INT [6]
    
    Andreas
    
    On 18 April 2014 04:33, jessica preston jpreston555@users.sf.net wrote:
    
    Hello,
    
    Do you mind explaining how QUAL is calculated by LoFreq? I am using the new
    version LoFreq-star.
    
    Another question, does LoFreq-star have an option to filter snv's by base
    quality. ie. like --ignore-bases in the previous version?
    
    Thanks much!!
    
    SNV QUAL calculation
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/lofreq/discussion/general/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    --
    
    Andreas Wilm
    andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
    
    SNV QUAL calculation
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/lofreq/discussion/general/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    --
    Andreas Wilm
    andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steve - 2018-05-03

As sources of errors, it takes base-qualities, mapping
qualities etc into account.

Thanks for this. However I was wondering if there was a more thorough explanation of each of the values that are used in calculation of the 'QUAL' score values that are output in the VCF? I did not see it covered in the publication (maybe I missed it?) and wasn't able to figure out what was going on in the source code.

Last edit: Steve 2018-05-03

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andreas Wilm - 2018-05-04
  
  Hi Steve,
  
  sure. The basics are explained in the NAR paper (Wilm, 2012): We compute a
  poisson-binomial distribution taking error probabilities at each pileup
  site into consideration and derive a p-value from that. Error probabilities
  were originally just converted base qualities (because that's what they
  are). In later LoFreq versions we merged base alignment, mapping and base
  quality into one error probability per base. The logic goes like this:
  either the read is misaligned (mapping quality) or if not, the base might
  be misaligned, or if neither of that is true then the base itself might be
  wrong, i.e.
  P_m + (1-P_m)P_a + (1-P_m)(1-P_a)*P_b,
  where P_m is the mapping error probability
  P_a is the base alignment error probability (BAQ) and
  P_b is the base error probability
  
  Hope this makes sense,
  Andreas
  
  On 4 May 2018 at 03:32, Steve stevekm@users.sourceforge.net wrote:
  
  As sources of errors, it takes base-qualities, mapping
  qualities etc into account.
  
  Thanks for this. However I was wondering if there was a more thorough
  explanation of each of the values that are used in calculation of the
  'QUAL' score values that are output in the VCF?
  
  SNV QUAL calculation
  https://sourceforge.net/p/lofreq/discussion/general/thread/7b713493/?limit=25#3495
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/lofreq/discussion/general/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Andreas Wilm
  andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SNV QUAL calculation

Fast and sensitive variant-calling from sequencing data

Forums

Help

SNV QUAL calculation

SNV QUAL calculation

Fast and sensitive variant-calling from sequencing data

Forums

Help

SNV QUAL calculation document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

SNV QUAL calculation