Menu

Base quality filter seems does not work

gmy
2015-04-08
2015-04-16
  • gmy

    gmy - 2015-04-08

    Hi,
    I want to filter bases with quality below 20. And I use command like this "lofreq call -f ref.fasta -q 20 -Q 20 -m 20 -C 10 -d 400 in.bam -o out.vcf".
    But it seems it didn't filter such bases. Because I noticed that position like this "gi|57116681|ref|NC_000962.2| 104830 C 23 AAAAAAAAAAAAAAaAAAAAAA^IA 1439;935534,=8=4;53235!". This position has about 12 bases with quality below 20. But lofreq didn't filter them. I extract this position's info through command "samtools mpileup -q 20 -Q 20 -s -f ref.fasta in.bam".

    I am confused about this problem.

    Thanks!
    gmy

     
    • Andreas Wilm

      Andreas Wilm - 2015-04-08

      Hi gmy,

      I would strongly encourage you to stick to default parameters in
      LoFreq, unless of course you have good reason to change them.
      Especially newer versions of LoFreq got rid of most filtering and
      tolerate even noisy data, so excessive filtering is not only
      unnecessary, but might even introduce biases.

      Now to your actual problem: Your command line syntax look correct
      (even though I would question those settings :) and if you are sure
      those bases are taken into account when computing SNVs this would be a
      bug. But how do you actually know LoFreq didn't filter those bases?
      You are showing an example pileup output, but just that is not enough
      to tell whether those bases got filtered. Could you provide the
      corresponding vcf output?

      Thanks,
      Andreas

      On 8 April 2015 at 19:46, gmy irongmy@users.sf.net wrote:

      Hi,
      I want to filter bases with quality below 20. And I use command like this
      "lofreq call -f ref.fasta -q 20 -Q 20 -m 20 -C 10 -d 400 in.bam -o out.vcf".
      But it seems it didn't filter such bases. Because I noticed that position
      like this "gi|57116681|ref|NC_000962.2| 104830 C 23
      AAAAAAAAAAAAAAaAAAAAAA^IA 1439;935534,=8=4;53235!". This position has about
      12 bases with quality below 20. But lofreq didn't filter them. I extract
      this position's info through command "samtools mpileup -q 20 -Q 20 -s -f
      ref.fasta in.bam".

      I am confused about this problem.

      Thanks!
      gmy


      Base quality filter seems does not work


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/lofreq/discussion/general/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Andreas Wilm
      andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC

       
      • gmy

        gmy - 2015-04-15

        Hi, Andreas
        Sorry for late reply.
        The corresponding output of lofreq is :
        gi|57116681|ref|NC_000962.2| 104830 . C A . PASS DP=23;AF=1.000000;SB=0;DP4=0,0,22,1;CONSVAR
        with the command I mentioned last time.
        So what's the reason?

        Thanks!
        gmy

         
        • Andreas Wilm

          Andreas Wilm - 2015-04-15

          Hi gmy,

          that is indeed a bit strange. Which exact LoFreq version are you using?

          Would you mind to share the reads spanning that specific region with
          me via PM, so that I can reproduce the problem and check what's going
          on? You can generate the BAM file only containing only that region
          with:
          samtools view -b YOUR.BAM 'gi|57116681|ref|NC_000962.2|:
          104830-104830' > 104830.bam
          Please use andreas.wilm@gmail.com or wilma@gis.a-star.edu.sg

          Thanks,
          Andreas

          On 15 April 2015 at 16:09, gmy irongmy@users.sf.net wrote:

          Hi, Andreas
          Sorry for late reply.
          The corresponding output of lofreq is :
          gi|57116681|ref|NC_000962.2| 104830 . C A . PASS
          DP=23;AF=1.000000;SB=0;DP4=0,0,22,1;CONSVAR
          with the command I mentioned last time.
          So what's the reason?

          Thanks!
          gmy


          Base quality filter seems does not work


          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/lofreq/discussion/general/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/

          --
          Andreas Wilm
          andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC

           
          • Andreas Wilm

            Andreas Wilm - 2015-04-16

            Sorry, I know what's happening: the filters will only affect the
            actual SNV calling and the computed p-value and the numerator for the
            AF computation. All other fields stay the same, i.e. you will still
            get the same coverage, DP4 etc. This was to prevent introducing
            filtering biases. For example you might remove the majority of
            reference bases because they are just below the cutoff which
            artificially increases your variant AF. (It was also kind of necessary
            in the past to handle consensus variants properly, but that has been
            addressed in the meantime differently.)

            Summed up: I think the 'base filter' as it is implemented right now is
            not intuitive for the user, because it's not filtering per se but you
            could call it 'masking during variant calls'. I have logged this as a
            github issue (https://github.com/CSB5/lofreq/issues/14)

            Thanks,
            Andreas

            On 15 April 2015 at 20:54, Andreas Wilm onde@users.sf.net wrote:

            Hi gmy,

            that is indeed a bit strange. Which exact LoFreq version are you using?

            Would you mind to share the reads spanning that specific region with
            me via PM, so that I can reproduce the problem and check what's going
            on? You can generate the BAM file only containing only that region
            with:
            samtools view -b YOUR.BAM 'gi|57116681|ref|NC_000962.2|:
            104830-104830' > 104830.bam
            Please use andreas.wilm@gmail.com or wilma@gis.a-star.edu.sg

            Thanks,
            Andreas

            On 15 April 2015 at 16:09, gmy irongmy@users.sf.net wrote:

            Hi, Andreas
            Sorry for late reply.
            The corresponding output of lofreq is :
            gi|57116681|ref|NC_000962.2| 104830 . C A . PASS
            DP=23;AF=1.000000;SB=0;DP4=0,0,22,1;CONSVAR
            with the command I mentioned last time.
            So what's the reason?

            Thanks!
            gmy


            Base quality filter seems does not work


            Sent from sourceforge.net because you indicated interest in
            https://sourceforge.net/p/lofreq/discussion/general/

            To unsubscribe from further messages, please visit
            https://sourceforge.net/auth/subscriptions/

            --
            Andreas Wilm
            andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC


            Base quality filter seems does not work


            Sent from sourceforge.net because you indicated interest in
            https://sourceforge.net/p/lofreq/discussion/general/

            To unsubscribe from further messages, please visit
            https://sourceforge.net/auth/subscriptions/

            --
            Andreas Wilm
            andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC

             

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.