|
From: Federico A. <fa...@sa...> - 2016-06-17 07:04:58
|
Thank you Colin, Yes, that’s the reason for the discrepancy! It seems realignment and BQ recalibration may not be a good choice in this case. The upstream indel that give rise to the discrepancies is supported by several reads and corresponds to a known SNP (rs11481705) It is said that realignment for the computation of base alignment quality (BAQ) reduces the number of false positives. If I have used a very stringent alignment approach and also realigned with GATK, is this feature of samtools still recommended? Regards, Federico > On 17 Jun 2016, at 01:55, Colin Hercus <co...@no...> wrote: > > Hi Federico, > > I expect this is related to BAQ calculations > > 2. Base alignment quality (BAQ) computation is turned on by default. > > BAQ is a phred-like score representing the probability that a read base is mis-aligned; it lowers the base quality score of mismatches that are near indels. This is to help rule out false positive SNP calls due to alignment artifacts near small indels. There have been recent suggestions, however, that BAQ may be too strict and cause real SNPs to be missed. Several users of the VarScan variant caller <http://varscan.sourceforge.net/>have reported that its read counts disagree with what is seen in IGV, or somatic mutations were missed <http://sourceforge.net/projects/varscan/forums/forum/1073559/topic/5061782> when mpileup was used instead of pileup. These issues are almost always due to BAQ’s downgrade of base qualities to 0 or 1. This adjustment can’t be seen in IGV, but it’s below VarScan’s default base quality threshold. You can disable BAQ with the -B parameter, or perform a more sensitive BAQ calculation with -E. I’ve heard that the latter option will be turned on by default in the next version of SAMtools. > > Try option --no-BAQ on the pileup with a reference. > > Colin > > On 17 June 2016 at 06:22, Federico Abascal <fa...@sa... <mailto:fa...@sa...>> wrote: > Hi, > > I’ve found something very strange with samtools mpileup, looks like a bug. > > If I provide the reference fasta I get completely different results than if I don’t. > > Look at this example: > ./samtools-1.3.1/samtools mpileup -r 20:36052981 -f human_g1k_v37.fasta file.bam > 20 36052981 A 2 ,G >1 > 20 36052982 G 2 ,. >1 > 20 36052983 G 2 ,$. <3 > 20 36052984 A 9 .,..,,.,. <B7C77779 > 20 36052985 G 9 .,..,,.,. <B7C77779 > 20 36052986 A 9 .,..$,,.,. <F7=77779 > 20 36052987 A 8 GgGggGgG </777779 > 20 36052988 G 8 .,.,,.,. <F777779 > 20 36052989 G 8 .,.,,.,. <F777779 > 20 36052990 A 8 .,.,,.,. FFKKKKKK > 20 36052991 G 8 .,.,,.,. FFKKKKKK > 20 36052992 C 8 .,.,,.,. FFKKKKKK > 20 36052993 T 8 .,.,,.,. FFKKKKKK > 20 36052994 G 8 .,.,,.,. FFKKKKKK > 20 36052995 A 8 .,.,,.,. BFKKKKKK > 20 36052996 G 8 .,.$,,.,. BF>KKKKK > > Now look what we get if we do not provide a reference: > ./samtools-1.3.1/samtools mpileup -r 20:36052981 file.bam > 20 36052981 N 11 aaGgGGggGgG BBFFKKKKKKK > 20 36052982 N 11 g$gGgGGggGgG BBFFKKKKKKK > 20 36052983 N 10 g$GgGGggGgG <FFKKKKKKK > 20 36052984 N 9 AaAAaaAaA FBKKKKKKK > 20 36052985 N 9 GgGGggGgG FBKKKKKKK > 20 36052986 N 9 AaAA$aaAaA FFKKKKKKK > 20 36052987 N 8 GgGggGgG F/KKKKKK > 20 36052988 N 8 GgGggGgG FFKKKKKK > 20 36052989 N 8 GgGggGgG FFKKKKKK > 20 36052990 N 8 AaAaaAaA FFKKKKKK > 20 36052991 N 8 GgGggGgG FFKKKKKK > 20 36052992 N 8 CcCccCcC FFKKKKKK > 20 36052993 N 8 TtTttTtT FFKKKKKK > 20 36052994 N 8 GgGggGgG FFKKKKKK > 20 36052995 N 8 AaAaaAaA BFKKKKKK > 20 36052996 N 8 GgG$ggGgG BFKKKKKK > > For site 36052981, if I provide the reference only two bases are reported, whereas without the reference 11 bases are found. > Note that quality bases also change. > > I checked with IGV that the second case is the correct one (see snapshot). Interestingly, all but two of the reads that overlap 36052981 have an indel three bases before, so I guess the bug may be related to this. > > I am attaching a very small bam of the region. If it helps diagnosing the problem I can send more information for other “strange” cases. > > Regards, > Federico > > > > > > > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic > patterns at an interface-level. Reveals which users, apps, and protocols are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity planning > reports. http://sdm.link/zohomanageengine <http://sdm.link/zohomanageengine> > _______________________________________________ > Samtools-help mailing list > Sam...@li... <mailto:Sam...@li...> > https://lists.sourceforge.net/lists/listinfo/samtools-help <https://lists.sourceforge.net/lists/listinfo/samtools-help> > > |