Hi,
I want to filter bases with quality below 20. And I use command like this "lofreq call -f ref.fasta -q 20 -Q 20 -m 20 -C 10 -d 400 in.bam -o out.vcf".
But it seems it didn't filter such bases. Because I noticed that position like this "gi|57116681|ref|NC_000962.2| 104830 C 23 AAAAAAAAAAAAAAaAAAAAAA^IA 1439;935534,=8=4;53235!". This position has about 12 bases with quality below 20. But lofreq didn't filter them. I extract this position's info through command "samtools mpileup -q 20 -Q 20 -s -f ref.fasta in.bam".
I am confused about this problem.
Thanks!
gmy
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I would strongly encourage you to stick to default parameters in
LoFreq, unless of course you have good reason to change them.
Especially newer versions of LoFreq got rid of most filtering and
tolerate even noisy data, so excessive filtering is not only
unnecessary, but might even introduce biases.
Now to your actual problem: Your command line syntax look correct
(even though I would question those settings :) and if you are sure
those bases are taken into account when computing SNVs this would be a
bug. But how do you actually know LoFreq didn't filter those bases?
You are showing an example pileup output, but just that is not enough
to tell whether those bases got filtered. Could you provide the
corresponding vcf output?
Hi,
I want to filter bases with quality below 20. And I use command like this
"lofreq call -f ref.fasta -q 20 -Q 20 -m 20 -C 10 -d 400 in.bam -o out.vcf".
But it seems it didn't filter such bases. Because I noticed that position
like this "gi|57116681|ref|NC_000962.2| 104830 C 23
AAAAAAAAAAAAAAaAAAAAAA^IA 1439;935534,=8=4;53235!". This position has about
12 bases with quality below 20. But lofreq didn't filter them. I extract
this position's info through command "samtools mpileup -q 20 -Q 20 -s -f
ref.fasta in.bam".
Hi, Andreas
Sorry for late reply.
The corresponding output of lofreq is :
gi|57116681|ref|NC_000962.2| 104830 . C A . PASS DP=23;AF=1.000000;SB=0;DP4=0,0,22,1;CONSVAR
with the command I mentioned last time.
So what's the reason?
Thanks!
gmy
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
that is indeed a bit strange. Which exact LoFreq version are you using?
Would you mind to share the reads spanning that specific region with
me via PM, so that I can reproduce the problem and check what's going
on? You can generate the BAM file only containing only that region
with:
samtools view -b YOUR.BAM 'gi|57116681|ref|NC_000962.2|:
104830-104830' > 104830.bam
Please use andreas.wilm@gmail.com or wilma@gis.a-star.edu.sg
Hi, Andreas
Sorry for late reply.
The corresponding output of lofreq is :
gi|57116681|ref|NC_000962.2| 104830 . C A . PASS
DP=23;AF=1.000000;SB=0;DP4=0,0,22,1;CONSVAR
with the command I mentioned last time.
So what's the reason?
Sorry, I know what's happening: the filters will only affect the
actual SNV calling and the computed p-value and the numerator for the
AF computation. All other fields stay the same, i.e. you will still
get the same coverage, DP4 etc. This was to prevent introducing
filtering biases. For example you might remove the majority of
reference bases because they are just below the cutoff which
artificially increases your variant AF. (It was also kind of necessary
in the past to handle consensus variants properly, but that has been
addressed in the meantime differently.)
Summed up: I think the 'base filter' as it is implemented right now is
not intuitive for the user, because it's not filtering per se but you
could call it 'masking during variant calls'. I have logged this as a
github issue (https://github.com/CSB5/lofreq/issues/14)
that is indeed a bit strange. Which exact LoFreq version are you using?
Would you mind to share the reads spanning that specific region with
me via PM, so that I can reproduce the problem and check what's going
on? You can generate the BAM file only containing only that region
with:
samtools view -b YOUR.BAM 'gi|57116681|ref|NC_000962.2|:
104830-104830' > 104830.bam
Please use andreas.wilm@gmail.com or wilma@gis.a-star.edu.sg
Thanks,
Andreas
On 15 April 2015 at 16:09, gmy irongmy@users.sf.net wrote:
Hi, Andreas
Sorry for late reply.
The corresponding output of lofreq is :
gi|57116681|ref|NC_000962.2| 104830 . C A . PASS
DP=23;AF=1.000000;SB=0;DP4=0,0,22,1;CONSVAR
with the command I mentioned last time.
So what's the reason?
Hi,
I want to filter bases with quality below 20. And I use command like this "lofreq call -f ref.fasta -q 20 -Q 20 -m 20 -C 10 -d 400 in.bam -o out.vcf".
But it seems it didn't filter such bases. Because I noticed that position like this "gi|57116681|ref|NC_000962.2| 104830 C 23 AAAAAAAAAAAAAAaAAAAAAA^IA 1439;935534,=8=4;53235!". This position has about 12 bases with quality below 20. But lofreq didn't filter them. I extract this position's info through command "samtools mpileup -q 20 -Q 20 -s -f ref.fasta in.bam".
I am confused about this problem.
Thanks!
gmy
Hi gmy,
I would strongly encourage you to stick to default parameters in
LoFreq, unless of course you have good reason to change them.
Especially newer versions of LoFreq got rid of most filtering and
tolerate even noisy data, so excessive filtering is not only
unnecessary, but might even introduce biases.
Now to your actual problem: Your command line syntax look correct
(even though I would question those settings :) and if you are sure
those bases are taken into account when computing SNVs this would be a
bug. But how do you actually know LoFreq didn't filter those bases?
You are showing an example pileup output, but just that is not enough
to tell whether those bases got filtered. Could you provide the
corresponding vcf output?
Thanks,
Andreas
On 8 April 2015 at 19:46, gmy irongmy@users.sf.net wrote:
--
Andreas Wilm
andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
Hi, Andreas
Sorry for late reply.
The corresponding output of lofreq is :
gi|57116681|ref|NC_000962.2| 104830 . C A . PASS DP=23;AF=1.000000;SB=0;DP4=0,0,22,1;CONSVAR
with the command I mentioned last time.
So what's the reason?
Thanks!
gmy
Hi gmy,
that is indeed a bit strange. Which exact LoFreq version are you using?
Would you mind to share the reads spanning that specific region with
me via PM, so that I can reproduce the problem and check what's going
on? You can generate the BAM file only containing only that region
with:
samtools view -b YOUR.BAM 'gi|57116681|ref|NC_000962.2|:
104830-104830' > 104830.bam
Please use andreas.wilm@gmail.com or wilma@gis.a-star.edu.sg
Thanks,
Andreas
On 15 April 2015 at 16:09, gmy irongmy@users.sf.net wrote:
--
Andreas Wilm
andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
Sorry, I know what's happening: the filters will only affect the
actual SNV calling and the computed p-value and the numerator for the
AF computation. All other fields stay the same, i.e. you will still
get the same coverage, DP4 etc. This was to prevent introducing
filtering biases. For example you might remove the majority of
reference bases because they are just below the cutoff which
artificially increases your variant AF. (It was also kind of necessary
in the past to handle consensus variants properly, but that has been
addressed in the meantime differently.)
Summed up: I think the 'base filter' as it is implemented right now is
not intuitive for the user, because it's not filtering per se but you
could call it 'masking during variant calls'. I have logged this as a
github issue (https://github.com/CSB5/lofreq/issues/14)
Thanks,
Andreas
On 15 April 2015 at 20:54, Andreas Wilm onde@users.sf.net wrote:
--
Andreas Wilm
andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC