Menu

Quality Filtering

2014-06-12
2014-06-17
  • jessica preston

    jessica preston - 2014-06-12

    Hello again,

    I have been using lofreq-star to call variants on very high quality reads (Base Quality>60) generated with the program SeqPrep. I want to filter my SNPs based on base quality, as we've discussed before. I have used the lofreq call --min-bq 60 --min-altbq 60 options successfully on standard data and it works great. However, when I run these options on SeqPrep data, it sometimes (not always) acts strange, and outputs far fewer SNPs when I use a low quality threshold (Q20) than when I use a high quality threshold (Q70). It seems like Lofreq may be thinking I am using phred64 when I am in fact using phred33. Is there are way to specify that I am using phred33 when calling variants with these options?

    Thanks a lot!
    -Jessica

     
    • Andreas Wilm

      Andreas Wilm - 2014-06-13

      Hi Jessica,

      if you keep low quality bases, then LoFreq will make fewer
      predictions, since lower quality means higher error rate and therefore
      higher chance that a seen variation is an error and not a true SNV.
      So the behavior you've observed is actually as expected (that's one
      reason why changing those thresholds isn't really recommended; LoFreq
      in theory be able to deal with things in its default mode).
      It's highly unlikely that there's a quality encoding problem, so only
      for the sake of completeness: yes, there is a way to tell LoFreq to
      assume Phred33 scaled scores in your BAM file: use option -I or
      --illumina-1.3 (see lofreq call --help).

      Andreas

      Andreas

      On 13 June 2014 06:54, jessica preston jpreston555@users.sf.net wrote:

      Hello again,

      I have been using lofreq-star to call variants on very high quality reads
      (Base Quality>60) generated with the program SeqPrep. I want to filter my
      SNPs based on base quality, as we've discussed before. I have used the
      lofreq call --min-bq 60 --min-altbq 60 options successfully on standard data
      and it works great. However, when I run these options on SeqPrep data, it
      sometimes (not always) acts strange, and outputs far fewer SNPs when I use a
      low quality threshold (Q20) than when I use a high quality threshold (Q70).
      It seems like Lofreq may be thinking I am using phred64 when I am in fact
      using phred33. Is there are way to specify that I am using phred33 when
      calling variants with these options?

      Thanks a lot!
      -Jessica


      Quality Filtering


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/lofreq/discussion/general/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Andreas Wilm
      andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC

       
      • Andreas Wilm

        Andreas Wilm - 2014-06-17

        Jessica, the latest version of LoFreq actually removes the need for
        any thresholding (and settings of default qualities). It's not 100%
        tested but I'm happy to share it with you before we make this publicly
        available. Feel free to email me (wilma@gis.a-star.edu.sg) if you're
        interested.

        Andreas

        On 13 June 2014 11:50, Andreas Wilm onde@users.sf.net wrote:

        Hi Jessica,

        if you keep low quality bases, then LoFreq will make fewer
        predictions, since lower quality means higher error rate and therefore
        higher chance that a seen variation is an error and not a true SNV.
        So the behavior you've observed is actually as expected (that's one
        reason why changing those thresholds isn't really recommended; LoFreq
        in theory be able to deal with things in its default mode).
        It's highly unlikely that there's a quality encoding problem, so only
        for the sake of completeness: yes, there is a way to tell LoFreq to
        assume Phred33 scaled scores in your BAM file: use option -I or
        --illumina-1.3 (see lofreq call --help).

        Andreas

        Andreas

        On 13 June 2014 06:54, jessica preston jpreston555@users.sf.net wrote:

        Hello again,

        I have been using lofreq-star to call variants on very high quality reads
        (Base Quality>60) generated with the program SeqPrep. I want to filter my
        SNPs based on base quality, as we've discussed before. I have used the
        lofreq call --min-bq 60 --min-altbq 60 options successfully on standard data
        and it works great. However, when I run these options on SeqPrep data, it
        sometimes (not always) acts strange, and outputs far fewer SNPs when I use a
        low quality threshold (Q20) than when I use a high quality threshold (Q70).
        It seems like Lofreq may be thinking I am using phred64 when I am in fact
        using phred33. Is there are way to specify that I am using phred33 when
        calling variants with these options?

        Thanks a lot!
        -Jessica


        Quality Filtering


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/lofreq/discussion/general/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

        --

        Andreas Wilm
        andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC


        Quality Filtering


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/lofreq/discussion/general/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

        --
        Andreas Wilm
        andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC

         

Log in to post a comment.