I am running Lofreq on data that has been run through the program SeqPrep, which generates a consensus sequence by merging overlapping paired-end reads. The SNPs called from these merged reads seem to have strand bias because they were all assigned as forward or reverse by Seqprep, but not both. I do still see a decent amount of SNPs with SB=0 in my SeqPrep data.
I have 2 questions:
1. Am I correct in thinking that SNPs labelled as SB=0 are considered not biased? Based on the reads they still seem biased. (eg Dp4=0,7,0,3000)
2. I am hoping to increase sensitivity. I am wondering, is there a way to disable the strand-bias
calculation when calling SNPs? As far as increasing sensitivity, I have already tried -J and -B options. I have also increased -s to 0.1 which didn't seem to change anything.
Thanks!
-Jessica
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
the strand-bias test checks whether the proportion of bases on forward
and reverse strand is different from the proportion of alternate bases
on forward and reverse strand (using Fisher's exact test). If most
reference bases are on one strand and most alternate bases are on the
same strand then there's no bias.
Regarding Question 1: Yes, SB=0 means there's absolutely no bias and
that's correct (see above)
And regarding 2: You can disable strand-bias tests, but how exactly
depends on the version of LoFreq you are using. If you're using
version 2 than 'lofreq call --no-default-filter' should do. Note
though, that this also won't remove low coverage positions, which you
would than have to do manually (by using lofreq filter --no-defaults
--cov-min 10)
I am running Lofreq on data that has been run through the program SeqPrep,
which generates a consensus sequence by merging overlapping paired-end
reads. The SNPs called from these merged reads seem to have strand bias
because they were all assigned as forward or reverse by Seqprep, but not
both. I do still see a decent amount of SNPs with SB=0 in my SeqPrep data.
I have 2 questions:
1. Am I correct in thinking that SNPs labelled as SB=0 are considered not
biased? Based on the reads they still seem biased. (eg Dp4=0,7,0,3000)
2. I am hoping to increase sensitivity. I am wondering, is there a way to
disable the strand-bias
calculation when calling SNPs? As far as increasing sensitivity, I have
already tried -J and -B options. I have also increased -s to 0.1 which
didn't seem to change anything.
we have analyzed some viral genomes where the strand bias has been estimated as zero. In these results, we have noticed that when the value is zero for the forward or the reverse strands that have the alternate base, the SB=0. Is it that in most cases when in the alternative strands tha value is zero, the SB=0 will be zero (implying no bias) when there actually is bias just by looking at the DP4 data? And maybe such results should not be considered at all? And then is the last example, where the consensus reverse strands are 2 while the alternate reverse strands are 3. Is this the reason for the high SB value?
strand-bias is defined as in samtools: reference and alternate base counts
on forward and reverse strand are used as input for Fisher's exact test.
This tries to quantify in how far the reference and alternate counts on
forward and reverse strand differ, i.e. you'll get high p-values if you
have lots of reference bases on one and lots of alternate bases on the
other strand. It does not test however whether both, reference and
alternate bases, are mainly on the same strand. I hope this explanation
makes sense..
we have analyzed some viral genomes where the strand bias has been
estimated as zero. In these results, we have noticed that when the value is
zero for the forward or the reverse strands that have the alternate base,
the SB=0. Is it that in most cases when in the alternative strands tha
value is zero, the SB=0 will be zero (implying no bias) when there actually
is bias just by looking at the DP4 data? And maybe such results should not
be considered at all? And then is the last example, where the consensus
reverse strands are 2 while the alternate reverse strands are 3. Is this
the reason for the high SB value?
I have another question about the SB score values from the .vcf output. It is my understanding that these values are Phred quality scores, which usually are in the range of 0 - 50. However, I am getting many with values of 500 - 1900. Is this expected? And if SB=0 mean no strand bias, then this means that these regions are extremely strand biased?
the strand-bias p-values is turned into a phred-quality, whose upper bound
depends on the precision of the float. In practice it can get much higher
then 1900. The fact that you see phred values <60 in other programs is
simply because it's mostly arbitrary capped there.
I have another question about the SB score values from the .vcf output. It
is my understanding that these values are Phred quality scores, which
usually are in the range of 0 - 50. However, I am getting many with values
of 500 - 1900. Is this expected? And if SB=0 mean no strand bias, then this
means that these regions are extremely strand biased?
Hi all,
I am analyzing amplicon deep-seq data, going for rare variants. While I've always run LoFreqFilter with default strand bias filtering (multiple test - FDR), now I would like to filter on a specific SB threshold. Can you advise on reasonable value? Should I consider SB=0 as the only true variants?
For example, how do you see this: DP=12219;AF=0.009657;SB=5;DP4=822,11273,11,107
thanks
Luca
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am running Lofreq on data that has been run through the program SeqPrep, which generates a consensus sequence by merging overlapping paired-end reads. The SNPs called from these merged reads seem to have strand bias because they were all assigned as forward or reverse by Seqprep, but not both. I do still see a decent amount of SNPs with SB=0 in my SeqPrep data.
I have 2 questions:
1. Am I correct in thinking that SNPs labelled as SB=0 are considered not biased? Based on the reads they still seem biased. (eg Dp4=0,7,0,3000)
2. I am hoping to increase sensitivity. I am wondering, is there a way to disable the strand-bias
calculation when calling SNPs? As far as increasing sensitivity, I have already tried -J and -B options. I have also increased -s to 0.1 which didn't seem to change anything.
Thanks!
-Jessica
Hi Jessica,
the strand-bias test checks whether the proportion of bases on forward
and reverse strand is different from the proportion of alternate bases
on forward and reverse strand (using Fisher's exact test). If most
reference bases are on one strand and most alternate bases are on the
same strand then there's no bias.
Regarding Question 1: Yes, SB=0 means there's absolutely no bias and
that's correct (see above)
And regarding 2: You can disable strand-bias tests, but how exactly
depends on the version of LoFreq you are using. If you're using
version 2 than 'lofreq call --no-default-filter' should do. Note
though, that this also won't remove low coverage positions, which you
would than have to do manually (by using lofreq filter --no-defaults
--cov-min 10)
Let me know if this didn't work as expected.
Thanks,
Andreas
On 21 November 2014 04:30, jessica preston jpreston555@users.sf.net wrote:
--
Andreas Wilm
andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
OK great, thanks!
Hello,
we have analyzed some viral genomes where the strand bias has been estimated as zero. In these results, we have noticed that when the value is zero for the forward or the reverse strands that have the alternate base, the SB=0. Is it that in most cases when in the alternative strands tha value is zero, the SB=0 will be zero (implying no bias) when there actually is bias just by looking at the DP4 data? And maybe such results should not be considered at all? And then is the last example, where the consensus reverse strands are 2 while the alternate reverse strands are 3. Is this the reason for the high SB value?
Thanks!
DP=77;AF=0.064935;SB=0;DP4=72,0,5,0
DP=77;AF=0.038961;SB=0;DP4=74,0,3,0
DP=80;AF=0.037500;SB=0;DP4=76,1,3,0
DP=79;AF=0.037975;SB=0;DP4=72,3,3,0
DP=65;AF=0.046154;SB=0;DP4=58,4,3,0
DP=17;AF=0.117647;SB=0;DP4=14,1,2,0
DP=19;AF=0.105263;SB=0;DP4=16,1,2,0
DP=14;AF=0.142857;SB=0;DP4=9,3,2,0
DP=13;AF=0.153846;SB=0;DP4=11,0,2,0
DP=15;AF=0.133333;SB=0;DP4=2,11,0,2
DP=13;AF=0.153846;SB=0;DP4=10,1,2,0
DP=13;AF=0.153846;SB=0;DP4=10,1,2,0
DP=19;AF=0.105263;SB=0;DP4=14,3,2,0
DP=19;AF=0.105263;SB=0;DP4=14,3,2,0
DP=32;AF=0.093750;SB=0;DP4=14,15,2,1
DP=21;AF=0.142857;SB=5;DP4=17,1,2,1
DP=15;AF=0.133333;SB=16;DP4=10,2,0,3
Hello,
strand-bias is defined as in samtools: reference and alternate base counts
on forward and reverse strand are used as input for Fisher's exact test.
This tries to quantify in how far the reference and alternate counts on
forward and reverse strand differ, i.e. you'll get high p-values if you
have lots of reference bases on one and lots of alternate bases on the
other strand. It does not test however whether both, reference and
alternate bases, are mainly on the same strand. I hope this explanation
makes sense..
Best,
Andreas
On 2 August 2017 at 01:54, Kiril Dimitrov dimitrovkiril@users.sf.net
wrote:
--
Andreas Wilm
andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
I have another question about the SB score values from the .vcf output. It is my understanding that these values are Phred quality scores, which usually are in the range of 0 - 50. However, I am getting many with values of 500 - 1900. Is this expected? And if SB=0 mean no strand bias, then this means that these regions are extremely strand biased?
Also, in this thread you state:
What is the meaning of 2147483647 in the SB value in VCF output? I have a lot of these as well.
Last edit: Steve 2018-05-03
Hi Steve,
the strand-bias p-values is turned into a phred-quality, whose upper bound
depends on the precision of the float. In practice it can get much higher
then 1900. The fact that you see phred values <60 in other programs is
simply because it's mostly arbitrary capped there.
Andreas
On 4 May 2018 at 03:50, Steve stevekm@users.sourceforge.net wrote:
--
Andreas Wilm
andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
Hi all,
I am analyzing amplicon deep-seq data, going for rare variants. While I've always run LoFreqFilter with default strand bias filtering (multiple test - FDR), now I would like to filter on a specific SB threshold. Can you advise on reasonable value? Should I consider SB=0 as the only true variants?
For example, how do you see this: DP=12219;AF=0.009657;SB=5;DP4=822,11273,11,107
thanks
Luca