From: Colin H. <co...@no...> - 2016-07-27 01:37:00
|
Hi Annie, As a way to avoid this problem we usually use a fasta file with no IUPAC ambiguous codes (other than N) as reference for variant calling. We'll take our standard reference without IUPAC codes, add IUPAC codes based on dbSNP and then map reads to this with novoalign. In the variant calling stage we use the original reference without the IUPAC codes. Kind Regards, Colin Colin Hercus, *Novocraft Technologies Sdn Bhd* <http://www.novocraft.com/> C-23A-05, 3 Two Square, Jalan 19/1, Section 19, 46300 Petaling Jaya, Selangor Darul Ehsan, Malaysia Tel: +6016 2482 668 Fax: +603 7960 0540 www.novocraft.com On 27 July 2016 at 00:04, John Marshall <jm...@sa...> wrote: > On 22 Jul 2016, at 21:49, Annie Cowell <ann...@gm...> wrote: > > samtools mpileup -C50 -Bug -t AD -Q10 -f my.fasta sample.bam | bcftools > call -mv -Ov > my.vcf > > > > However, I am getting unusual bases for a couple of indels in the > reference field, such as W (as below) and R. > > > > AAKM01000065 8883 . GGW G 228 . > INDEL;IDV=40;IMF=0.888889;DP=45;VDB=0.880501;SGB=-0.692976;MQSB=0.114162;MQ0F=0;AC=2;AN=2;DP4=0,0,8,18;MQ=49 > GT:PL:AD 1/1:255,78,0:0,26 > > As Thomas guessed, these are indeed ambiguity codes coming from the > reference genome. From http://www.ncbi.nlm.nih.gov/nuccore/AAKM01000065 > we can see the GGW bases at location 8883 onwards: > > 8881 aaggwtataa agaacgcata tatgcctttt gtcccctgcc cgcgtctgga > tactcgctac > > The issue of IUPAC ambiguity codes in VCF REF fields came up a while ago > [1], and the consensus expressed in the VCF v4.3 spec is that they > shouldn't appear -- instead being simplified to a non-ambiguous base, in > this case GGA. So this is an mpileup and/or bcftools bug, and apparently > it should be suppressing ambiguity codes by default. > > John > > [1] https://github.com/samtools/hts-specs/issues/54 > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning > reports.http://sdm.link/zohodev2dev > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help > |