LoFreq / Discussion / General Discussion: Strand Bias

jessica preston - 2014-11-20

Hi,

I am running Lofreq on data that has been run through the program SeqPrep, which generates a consensus sequence by merging overlapping paired-end reads. The SNPs called from these merged reads seem to have strand bias because they were all assigned as forward or reverse by Seqprep, but not both. I do still see a decent amount of SNPs with SB=0 in my SeqPrep data.
I have 2 questions:
1. Am I correct in thinking that SNPs labelled as SB=0 are considered not biased? Based on the reads they still seem biased. (eg Dp4=0,7,0,3000)
2. I am hoping to increase sensitivity. I am wondering, is there a way to disable the strand-bias
calculation when calling SNPs? As far as increasing sensitivity, I have already tried -J and -B options. I have also increased -s to 0.1 which didn't seem to change anything.

Thanks!
-Jessica

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andreas Wilm - 2014-11-21
  
  Hi Jessica,
  
  the strand-bias test checks whether the proportion of bases on forward
  and reverse strand is different from the proportion of alternate bases
  on forward and reverse strand (using Fisher's exact test). If most
  reference bases are on one strand and most alternate bases are on the
  same strand then there's no bias.
  
  Regarding Question 1: Yes, SB=0 means there's absolutely no bias and
  that's correct (see above)
  
  And regarding 2: You can disable strand-bias tests, but how exactly
  depends on the version of LoFreq you are using. If you're using
  version 2 than 'lofreq call --no-default-filter' should do. Note
  though, that this also won't remove low coverage positions, which you
  would than have to do manually (by using lofreq filter --no-defaults
  --cov-min 10)
  
  Let me know if this didn't work as expected.
  
  Thanks,
  Andreas
  
  On 21 November 2014 04:30, jessica preston jpreston555@users.sf.net wrote:
  
  Hi,
  
  I am running Lofreq on data that has been run through the program SeqPrep,
  which generates a consensus sequence by merging overlapping paired-end
  reads. The SNPs called from these merged reads seem to have strand bias
  because they were all assigned as forward or reverse by Seqprep, but not
  both. I do still see a decent amount of SNPs with SB=0 in my SeqPrep data.
  I have 2 questions:
  1. Am I correct in thinking that SNPs labelled as SB=0 are considered not
  biased? Based on the reads they still seem biased. (eg Dp4=0,7,0,3000)
  2. I am hoping to increase sensitivity. I am wondering, is there a way to
  disable the strand-bias
  calculation when calling SNPs? As far as increasing sensitivity, I have
  already tried -J and -B options. I have also increased -s to 0.1 which
  didn't seem to change anything.
  
  Thanks!
  -Jessica
  
  Strand Bias
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/lofreq/discussion/general/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Andreas Wilm
  andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - jessica preston - 2014-11-21
    
    OK great, thanks!
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kiril Dimitrov - 2017-08-01

Hello,

we have analyzed some viral genomes where the strand bias has been estimated as zero. In these results, we have noticed that when the value is zero for the forward or the reverse strands that have the alternate base, the SB=0. Is it that in most cases when in the alternative strands tha value is zero, the SB=0 will be zero (implying no bias) when there actually is bias just by looking at the DP4 data? And maybe such results should not be considered at all? And then is the last example, where the consensus reverse strands are 2 while the alternate reverse strands are 3. Is this the reason for the high SB value?

Thanks!

DP=77;AF=0.064935;SB=0;DP4=72,0,5,0
DP=77;AF=0.038961;SB=0;DP4=74,0,3,0
DP=80;AF=0.037500;SB=0;DP4=76,1,3,0
DP=79;AF=0.037975;SB=0;DP4=72,3,3,0
DP=65;AF=0.046154;SB=0;DP4=58,4,3,0
DP=17;AF=0.117647;SB=0;DP4=14,1,2,0
DP=19;AF=0.105263;SB=0;DP4=16,1,2,0
DP=14;AF=0.142857;SB=0;DP4=9,3,2,0
DP=13;AF=0.153846;SB=0;DP4=11,0,2,0
DP=15;AF=0.133333;SB=0;DP4=2,11,0,2
DP=13;AF=0.153846;SB=0;DP4=10,1,2,0
DP=13;AF=0.153846;SB=0;DP4=10,1,2,0
DP=19;AF=0.105263;SB=0;DP4=14,3,2,0
DP=19;AF=0.105263;SB=0;DP4=14,3,2,0
DP=32;AF=0.093750;SB=0;DP4=14,15,2,1
DP=21;AF=0.142857;SB=5;DP4=17,1,2,1
DP=15;AF=0.133333;SB=16;DP4=10,2,0,3

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andreas Wilm - 2017-08-03
  
  Hello,
  
  strand-bias is defined as in samtools: reference and alternate base counts
  on forward and reverse strand are used as input for Fisher's exact test.
  This tries to quantify in how far the reference and alternate counts on
  forward and reverse strand differ, i.e. you'll get high p-values if you
  have lots of reference bases on one and lots of alternate bases on the
  other strand. It does not test however whether both, reference and
  alternate bases, are mainly on the same strand. I hope this explanation
  makes sense..
  
  Best,
  Andreas
  
  On 2 August 2017 at 01:54, Kiril Dimitrov dimitrovkiril@users.sf.net
  wrote:
  
  Hello,
  
  we have analyzed some viral genomes where the strand bias has been
  estimated as zero. In these results, we have noticed that when the value is
  zero for the forward or the reverse strands that have the alternate base,
  the SB=0. Is it that in most cases when in the alternative strands tha
  value is zero, the SB=0 will be zero (implying no bias) when there actually
  is bias just by looking at the DP4 data? And maybe such results should not
  be considered at all? And then is the last example, where the consensus
  reverse strands are 2 while the alternate reverse strands are 3. Is this
  the reason for the high SB value?
  
  Thanks!
  
  DP=77;AF=0.064935;SB=0;DP4=72,0,5,0
  DP=77;AF=0.038961;SB=0;DP4=74,0,3,0
  DP=80;AF=0.037500;SB=0;DP4=76,1,3,0
  DP=79;AF=0.037975;SB=0;DP4=72,3,3,0
  DP=65;AF=0.046154;SB=0;DP4=58,4,3,0
  DP=17;AF=0.117647;SB=0;DP4=14,1,2,0
  DP=19;AF=0.105263;SB=0;DP4=16,1,2,0
  DP=14;AF=0.142857;SB=0;DP4=9,3,2,0
  DP=13;AF=0.153846;SB=0;DP4=11,0,2,0
  DP=15;AF=0.133333;SB=0;DP4=2,11,0,2
  DP=13;AF=0.153846;SB=0;DP4=10,1,2,0
  DP=13;AF=0.153846;SB=0;DP4=10,1,2,0
  DP=19;AF=0.105263;SB=0;DP4=14,3,2,0
  DP=19;AF=0.105263;SB=0;DP4=14,3,2,0
  DP=32;AF=0.093750;SB=0;DP4=14,15,2,1
  DP=21;AF=0.142857;SB=5;DP4=17,1,2,1
  DP=15;AF=0.133333;SB=16;DP4=10,2,0,3
  
  Strand Bias
  https://sourceforge.net/p/lofreq/discussion/general/thread/ee151ab0/?limit=25#296b
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/lofreq/discussion/general/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Andreas Wilm
  andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steve - 2018-05-03

I have another question about the SB score values from the .vcf output. It is my understanding that these values are Phred quality scores, which usually are in the range of 0 - 50. However, I am getting many with values of 500 - 1900. Is this expected? And if SB=0 mean no strand bias, then this means that these regions are extremely strand biased?

Also, in this thread you state:

2147483647: This corresponds to a p-value close to zero, i.e. a
highly significant SNV.

What is the meaning of 2147483647 in the SB value in VCF output? I have a lot of these as well.

Last edit: Steve 2018-05-03

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andreas Wilm - 2018-05-04
  
  Hi Steve,
  
  the strand-bias p-values is turned into a phred-quality, whose upper bound
  depends on the precision of the float. In practice it can get much higher
  then 1900. The fact that you see phred values <60 in other programs is
  simply because it's mostly arbitrary capped there.
  
  Andreas
  
  On 4 May 2018 at 03:50, Steve stevekm@users.sourceforge.net wrote:
  
  I have another question about the SB score values from the .vcf output. It
  is my understanding that these values are Phred quality scores, which
  usually are in the range of 0 - 50. However, I am getting many with values
  of 500 - 1900. Is this expected? And if SB=0 mean no strand bias, then this
  means that these regions are extremely strand biased?
  
  Strand Bias
  https://sourceforge.net/p/lofreq/discussion/general/thread/ee151ab0/?limit=25#5681
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/lofreq/discussion/general/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Andreas Wilm
  andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Luca Mologni - 2022-03-03

Hi all,
I am analyzing amplicon deep-seq data, going for rare variants. While I've always run LoFreqFilter with default strand bias filtering (multiple test - FDR), now I would like to filter on a specific SB threshold. Can you advise on reasonable value? Should I consider SB=0 as the only true variants?
For example, how do you see this: DP=12219;AF=0.009657;SB=5;DP4=822,11273,11,107

thanks
Luca

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Strand Bias

Fast and sensitive variant-calling from sequencing data

Forums

Help

Strand Bias

Strand Bias

Fast and sensitive variant-calling from sequencing data

Forums

Help

Strand Bias document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Strand Bias