I have been using lofreq-star to call variants on very high quality reads (Base Quality>60) generated with the program SeqPrep. I want to filter my SNPs based on base quality, as we've discussed before. I have used the lofreq call --min-bq 60 --min-altbq 60 options successfully on standard data and it works great. However, when I run these options on SeqPrep data, it sometimes (not always) acts strange, and outputs far fewer SNPs when I use a low quality threshold (Q20) than when I use a high quality threshold (Q70). It seems like Lofreq may be thinking I am using phred64 when I am in fact using phred33. Is there are way to specify that I am using phred33 when calling variants with these options?
Thanks a lot!
-Jessica
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
if you keep low quality bases, then LoFreq will make fewer
predictions, since lower quality means higher error rate and therefore
higher chance that a seen variation is an error and not a true SNV.
So the behavior you've observed is actually as expected (that's one
reason why changing those thresholds isn't really recommended; LoFreq
in theory be able to deal with things in its default mode).
It's highly unlikely that there's a quality encoding problem, so only
for the sake of completeness: yes, there is a way to tell LoFreq to
assume Phred33 scaled scores in your BAM file: use option -I or
--illumina-1.3 (see lofreq call --help).
I have been using lofreq-star to call variants on very high quality reads
(Base Quality>60) generated with the program SeqPrep. I want to filter my
SNPs based on base quality, as we've discussed before. I have used the
lofreq call --min-bq 60 --min-altbq 60 options successfully on standard data
and it works great. However, when I run these options on SeqPrep data, it
sometimes (not always) acts strange, and outputs far fewer SNPs when I use a
low quality threshold (Q20) than when I use a high quality threshold (Q70).
It seems like Lofreq may be thinking I am using phred64 when I am in fact
using phred33. Is there are way to specify that I am using phred33 when
calling variants with these options?
Jessica, the latest version of LoFreq actually removes the need for
any thresholding (and settings of default qualities). It's not 100%
tested but I'm happy to share it with you before we make this publicly
available. Feel free to email me (wilma@gis.a-star.edu.sg) if you're
interested.
if you keep low quality bases, then LoFreq will make fewer
predictions, since lower quality means higher error rate and therefore
higher chance that a seen variation is an error and not a true SNV.
So the behavior you've observed is actually as expected (that's one
reason why changing those thresholds isn't really recommended; LoFreq
in theory be able to deal with things in its default mode).
It's highly unlikely that there's a quality encoding problem, so only
for the sake of completeness: yes, there is a way to tell LoFreq to
assume Phred33 scaled scores in your BAM file: use option -I or
--illumina-1.3 (see lofreq call --help).
Andreas
Andreas
On 13 June 2014 06:54, jessica preston jpreston555@users.sf.net wrote:
Hello again,
I have been using lofreq-star to call variants on very high quality reads
(Base Quality>60) generated with the program SeqPrep. I want to filter my
SNPs based on base quality, as we've discussed before. I have used the
lofreq call --min-bq 60 --min-altbq 60 options successfully on standard data
and it works great. However, when I run these options on SeqPrep data, it
sometimes (not always) acts strange, and outputs far fewer SNPs when I use a
low quality threshold (Q20) than when I use a high quality threshold (Q70).
It seems like Lofreq may be thinking I am using phred64 when I am in fact
using phred33. Is there are way to specify that I am using phred33 when
calling variants with these options?
Hello again,
I have been using lofreq-star to call variants on very high quality reads (Base Quality>60) generated with the program SeqPrep. I want to filter my SNPs based on base quality, as we've discussed before. I have used the lofreq call --min-bq 60 --min-altbq 60 options successfully on standard data and it works great. However, when I run these options on SeqPrep data, it sometimes (not always) acts strange, and outputs far fewer SNPs when I use a low quality threshold (Q20) than when I use a high quality threshold (Q70). It seems like Lofreq may be thinking I am using phred64 when I am in fact using phred33. Is there are way to specify that I am using phred33 when calling variants with these options?
Thanks a lot!
-Jessica
Hi Jessica,
if you keep low quality bases, then LoFreq will make fewer
predictions, since lower quality means higher error rate and therefore
higher chance that a seen variation is an error and not a true SNV.
So the behavior you've observed is actually as expected (that's one
reason why changing those thresholds isn't really recommended; LoFreq
in theory be able to deal with things in its default mode).
It's highly unlikely that there's a quality encoding problem, so only
for the sake of completeness: yes, there is a way to tell LoFreq to
assume Phred33 scaled scores in your BAM file: use option -I or
--illumina-1.3 (see lofreq call --help).
Andreas
Andreas
On 13 June 2014 06:54, jessica preston jpreston555@users.sf.net wrote:
--
Andreas Wilm
andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC
Jessica, the latest version of LoFreq actually removes the need for
any thresholding (and settings of default qualities). It's not 100%
tested but I'm happy to share it with you before we make this publicly
available. Feel free to email me (wilma@gis.a-star.edu.sg) if you're
interested.
Andreas
On 13 June 2014 11:50, Andreas Wilm onde@users.sf.net wrote:
--
Andreas Wilm
andreas.wilm@gmail.com | mail@andreas-wilm.com | 0x7C68FBCC