Menu

Strand Bias

2014-11-20
2022-03-03
  • jessica preston

    jessica preston - 2014-11-20

    Hi,

    I am running Lofreq on data that has been run through the program SeqPrep, which generates a consensus sequence by merging overlapping paired-end reads. The SNPs called from these merged reads seem to have strand bias because they were all assigned as forward or reverse by Seqprep, but not both. I do still see a decent amount of SNPs with SB=0 in my SeqPrep data.
    I have 2 questions:
    1. Am I correct in thinking that SNPs labelled as SB=0 are considered not biased? Based on the reads they still seem biased. (eg Dp4=0,7,0,3000)
    2. I am hoping to increase sensitivity. I am wondering, is there a way to disable the strand-bias
    calculation when calling SNPs? As far as increasing sensitivity, I have already tried -J and -B options. I have also increased -s to 0.1 which didn't seem to change anything.

    Thanks!
    -Jessica

     
    • Andreas Wilm

      Andreas Wilm - 2014-11-21

      Hi Jessica,

      the strand-bias test checks whether the proportion of bases on forward
      and reverse strand is different from the proportion of alternate bases
      on forward and reverse strand (using Fisher's exact test). If most
      reference bases are on one strand and most alternate bases are on the
      same strand then there's no bias.

      Regarding Question 1: Yes, SB=0 means there's absolutely no bias and
      that's correct (see above)

      And regarding 2: You can disable strand-bias tests, but how exactly
      depends on the version of LoFreq you are using. If you're using
      version 2 than 'lofreq call --no-default-filter' should do. Note
      though, that this also won't remove low coverage positions, which you
      would than have to do manually (by using lofreq filter --no-defaults
      --cov-min 10)

      Let me know if this didn't work as expected.

      Thanks,
      Andreas

      On 21 November 2014 04:30, jessica preston jpreston555@users.sf.net wrote:

      Hi,

      I am running Lofreq on data that has been run through the program SeqPrep,
      which generates a consensus sequence by merging overlapping paired-end
      reads. The SNPs called from these merged reads seem to have strand bias
      because they were all assigned as forward or reverse by Seqprep, but not
      both. I do still see a decent amount of SNPs with SB=0 in my SeqPrep data.
      I have 2 questions:
      1. Am I correct in thinking that SNPs labelled as SB=0 are considered not
      biased? Based on the reads they still seem biased. (eg Dp4=0,7,0,3000)
      2. I am hoping to increase sensitivity. I am wondering, is there a way to
      disable the strand-bias
      calculation when calling SNPs? As far as increasing sensitivity, I have
      already tried -J and -B options. I have also increased -s to 0.1 which
      didn't seem to change anything.

      Thanks!
      -Jessica


      Strand Bias


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/lofreq/discussion/general/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Andreas Wilm
      andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC

       
      • jessica preston

        jessica preston - 2014-11-21

        OK great, thanks!

         
  • Kiril Dimitrov

    Kiril Dimitrov - 2017-08-01

    Hello,

    we have analyzed some viral genomes where the strand bias has been estimated as zero. In these results, we have noticed that when the value is zero for the forward or the reverse strands that have the alternate base, the SB=0. Is it that in most cases when in the alternative strands tha value is zero, the SB=0 will be zero (implying no bias) when there actually is bias just by looking at the DP4 data? And maybe such results should not be considered at all? And then is the last example, where the consensus reverse strands are 2 while the alternate reverse strands are 3. Is this the reason for the high SB value?

    Thanks!

    DP=77;AF=0.064935;SB=0;DP4=72,0,5,0
    DP=77;AF=0.038961;SB=0;DP4=74,0,3,0
    DP=80;AF=0.037500;SB=0;DP4=76,1,3,0
    DP=79;AF=0.037975;SB=0;DP4=72,3,3,0
    DP=65;AF=0.046154;SB=0;DP4=58,4,3,0
    DP=17;AF=0.117647;SB=0;DP4=14,1,2,0
    DP=19;AF=0.105263;SB=0;DP4=16,1,2,0
    DP=14;AF=0.142857;SB=0;DP4=9,3,2,0
    DP=13;AF=0.153846;SB=0;DP4=11,0,2,0
    DP=15;AF=0.133333;SB=0;DP4=2,11,0,2
    DP=13;AF=0.153846;SB=0;DP4=10,1,2,0
    DP=13;AF=0.153846;SB=0;DP4=10,1,2,0
    DP=19;AF=0.105263;SB=0;DP4=14,3,2,0
    DP=19;AF=0.105263;SB=0;DP4=14,3,2,0
    DP=32;AF=0.093750;SB=0;DP4=14,15,2,1
    DP=21;AF=0.142857;SB=5;DP4=17,1,2,1
    DP=15;AF=0.133333;SB=16;DP4=10,2,0,3

     
    • Andreas Wilm

      Andreas Wilm - 2017-08-03

      Hello,

      strand-bias is defined as in samtools: reference and alternate base counts
      on forward and reverse strand are used as input for Fisher's exact test.
      This tries to quantify in how far the reference and alternate counts on
      forward and reverse strand differ, i.e. you'll get high p-values if you
      have lots of reference bases on one and lots of alternate bases on the
      other strand. It does not test however whether both, reference and
      alternate bases, are mainly on the same strand. I hope this explanation
      makes sense..

      Best,
      Andreas

      On 2 August 2017 at 01:54, Kiril Dimitrov dimitrovkiril@users.sf.net
      wrote:

      Hello,

      we have analyzed some viral genomes where the strand bias has been
      estimated as zero. In these results, we have noticed that when the value is
      zero for the forward or the reverse strands that have the alternate base,
      the SB=0. Is it that in most cases when in the alternative strands tha
      value is zero, the SB=0 will be zero (implying no bias) when there actually
      is bias just by looking at the DP4 data? And maybe such results should not
      be considered at all? And then is the last example, where the consensus
      reverse strands are 2 while the alternate reverse strands are 3. Is this
      the reason for the high SB value?

      Thanks!

      DP=77;AF=0.064935;SB=0;DP4=72,0,5,0
      DP=77;AF=0.038961;SB=0;DP4=74,0,3,0
      DP=80;AF=0.037500;SB=0;DP4=76,1,3,0
      DP=79;AF=0.037975;SB=0;DP4=72,3,3,0
      DP=65;AF=0.046154;SB=0;DP4=58,4,3,0
      DP=17;AF=0.117647;SB=0;DP4=14,1,2,0
      DP=19;AF=0.105263;SB=0;DP4=16,1,2,0
      DP=14;AF=0.142857;SB=0;DP4=9,3,2,0
      DP=13;AF=0.153846;SB=0;DP4=11,0,2,0
      DP=15;AF=0.133333;SB=0;DP4=2,11,0,2
      DP=13;AF=0.153846;SB=0;DP4=10,1,2,0
      DP=13;AF=0.153846;SB=0;DP4=10,1,2,0
      DP=19;AF=0.105263;SB=0;DP4=14,3,2,0
      DP=19;AF=0.105263;SB=0;DP4=14,3,2,0
      DP=32;AF=0.093750;SB=0;DP4=14,15,2,1
      DP=21;AF=0.142857;SB=5;DP4=17,1,2,1
      DP=15;AF=0.133333;SB=16;DP4=10,2,0,3


      Strand Bias
      https://sourceforge.net/p/lofreq/discussion/general/thread/ee151ab0/?limit=25#296b


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/lofreq/discussion/general/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Andreas Wilm
      andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC

       
  • Steve

    Steve - 2018-05-03

    I have another question about the SB score values from the .vcf output. It is my understanding that these values are Phred quality scores, which usually are in the range of 0 - 50. However, I am getting many with values of 500 - 1900. Is this expected? And if SB=0 mean no strand bias, then this means that these regions are extremely strand biased?

    Also, in this thread you state:

    2147483647: This corresponds to a p-value close to zero, i.e. a
    highly significant SNV.

    What is the meaning of 2147483647 in the SB value in VCF output? I have a lot of these as well.

     

    Last edit: Steve 2018-05-03
    • Andreas Wilm

      Andreas Wilm - 2018-05-04

      Hi Steve,

      the strand-bias p-values is turned into a phred-quality, whose upper bound
      depends on the precision of the float. In practice it can get much higher
      then 1900. The fact that you see phred values <60 in other programs is
      simply because it's mostly arbitrary capped there.

      Andreas

      On 4 May 2018 at 03:50, Steve stevekm@users.sourceforge.net wrote:

      I have another question about the SB score values from the .vcf output. It
      is my understanding that these values are Phred quality scores, which
      usually are in the range of 0 - 50. However, I am getting many with values
      of 500 - 1900. Is this expected? And if SB=0 mean no strand bias, then this
      means that these regions are extremely strand biased?


      Strand Bias
      https://sourceforge.net/p/lofreq/discussion/general/thread/ee151ab0/?limit=25#5681


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/lofreq/discussion/general/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Andreas Wilm
      andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC

       
  • Luca Mologni

    Luca Mologni - 2022-03-03

    Hi all,
    I am analyzing amplicon deep-seq data, going for rare variants. While I've always run LoFreqFilter with default strand bias filtering (multiple test - FDR), now I would like to filter on a specific SB threshold. Can you advise on reasonable value? Should I consider SB=0 as the only true variants?
    For example, how do you see this: DP=12219;AF=0.009657;SB=5;DP4=822,11273,11,107

    thanks
    Luca

     

Log in to post a comment.