You usually don't. Predicted variants are already filtered using default
parameters (which include coverage, strand-bias, snv-quality etc).
However, I do not see any details about what these default filtering
parameters are. Is there a description anywhere? When I try to run lofreq
filter --verbose, the only output I get is:
Setting default SB filtering method to FDR
Setting default minimum coverage to 10
What other criteria are being used to filter variants?
not sure why the actually quality filtering is not mentioned there. Let me
look into this. Anyway, the main filtering step is working on the variant
qualities (which are converted p-values) and it's by default based on
Bonferroni correction and a significance threshold of 0.01
You usually don't. Predicted variants are already filtered using default
parameters (which include coverage, strand-bias, snv-quality etc).
However, I do not see any details about what these default filtering
parameters are. Is there a description anywhere? When I try to run lofreq
filter --verbose, the only output I get is:
Setting default SB filtering method to FDR
Setting default minimum coverage to 10
What other criteria are being used to filter variants?
yes, these variants are not filtered, even though if you just look at the
pvalue/quality, they should be. The reason is that strand-bias is a messy
beast and we use some hacks:
No one really knows why it happens (AFAIK). In viral amplicon data (for
which LoFreq was originally designed) we often saw cases, where simply due
to the ultra high coverage, you'd get very high p-values even though
nothing seem wrong with these variants if you were to evaluate them by eye
(plenty of coverage for ref and alt and forward and reverse strand; but
skewed obviously, otherwise you wouldn't get a high p-value). So we
introduced a compound filter. See the section on strand bias when you run lofreq filter --help:
Note, variants are only filtered if their SB pvalue is below the threshold
AND 85% of variant bases are on one strand (toggled with
--sb-no-compound).
This is under-documented and not ideal, but we need to define defaults. I
guess a newer LoFreq version would you presets to define a set of calling
and filtering parameters based on input type. You might want to experiment
with running lofreq call --no-default-filter and then trying different
parameters in a subsequent lofreq filter run.
The FAQ page for LoFreq says
However, I do not see any details about what these default filtering
parameters are. Is there a description anywhere? When I try to run
lofreq filter --verbose
, the only output I get is:What other criteria are being used to filter variants?
Hi Steve,
not sure why the actually quality filtering is not mentioned there. Let me
look into this. Anyway, the main filtering step is working on the variant
qualities (which are converted p-values) and it's by default based on
Bonferroni correction and a significance threshold of 0.01
Best,
Andreas
On 4 May 2018 at 07:12, Steve stevekm@users.sourceforge.net wrote:
--
Andreas Wilm
andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
Thanks Andreas.
I was looking in the source code and saw here:
https://github.com/CSB5/lofreq/blob/master/src/lofreq/lofreq_filter.c#L1093
Does this mean that the default Strand Bias filter is at a p-value of 0.001? (
cfg.sb_filter.alpha = 0.001
)As per my other post, I am getting a lot of variants with SB values of >500, so does this mean that strand bias is not being filtered by default?
Hi Steve,
yes, these variants are not filtered, even though if you just look at the
pvalue/quality, they should be. The reason is that strand-bias is a messy
beast and we use some hacks:
No one really knows why it happens (AFAIK). In viral amplicon data (for
which LoFreq was originally designed) we often saw cases, where simply due
to the ultra high coverage, you'd get very high p-values even though
nothing seem wrong with these variants if you were to evaluate them by eye
(plenty of coverage for ref and alt and forward and reverse strand; but
skewed obviously, otherwise you wouldn't get a high p-value). So we
introduced a compound filter. See the section on strand bias when you run
lofreq filter --help
:Note, variants are only filtered if their SB pvalue is below the threshold
AND 85% of variant bases are on one strand (toggled with
--sb-no-compound).
This is under-documented and not ideal, but we need to define defaults. I
guess a newer LoFreq version would you presets to define a set of calling
and filtering parameters based on input type. You might want to experiment
with running
lofreq call --no-default-filter
and then trying differentparameters in a subsequent
lofreq filter
run.Andreas
On 4 May 2018 at 23:03, Steve stevekm@users.sourceforge.net wrote:
--
Andreas Wilm
andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC