| 
     
      
      
      From: Jody P. <jod...@gm...> - 2018-05-02 23:54:49
      
     
   | 
Hi James, Thanks for your help. I have tried your suggestions however it still appears to give the same results. I am in the process of uploading the data to the ENA but while I wait I have put together another example using very similar data. The example finds a very large variant at position 3329088 however by looking at the raw data I find very little support for it. I have created a gist so you can replicate it if you would like. https://gist.github.com/jodyphelan/e721112d91dcaaa67cd79217ef2af24f Thanks, Jody On Tue, May 1, 2018 at 2:02 PM, James Bonfield <jk...@sa...> wrote: > On Tue, May 01, 2018 at 10:00:49AM +0100, Jody Phelan wrote: > > I'm runinig into some trouble when truing to use the 'samtools mpileup | > > bcftools call' combination. > > It seems to incorrectly be calling very long indels where it looks like > > there is not support. For example: > > > > 'samtools mpileup -f ref.fa sample.bam -r Chromosome:198940-198940' > > produces: > > > > Chromosome 198940 C 37 .,.-1A.-1A...-1A.-1A.-1A.,-1a. > > -1A.,-1a,-1a.,.-1A,,.-1A,,+1a,+1a,.-1A,,-1a.,-1a,-1a,,-1a,-1a.-1A,-1a. > > 06?5/<:8>/4<?3691465<3354.08:23:11@3C > > > > This looks fine... however when I try to call the genotype with bcftools > I > > get a very long indel. > > 'samtools mpileup -ugf ref.fa sample.bam -r Chromosome:198940-198940 -t > AD > > | bcftools call -mv' produces: > > > > Chromosome 198940 . > > CAAAAGACGTCATCGACGTACGGTCGGTGACCACCGAGATCAACACGTTGTTCCAGACGC > TCACCTCGATCGCCGA > > C 225 . > > INDEL;IDV=32;IMF=0.5;DP=62;VDB=0.145385;SGB=-0.662043; > MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,7,2;MQ=60 > > GT:PL:AD 1/1:255,27,0:0,9 > > Firstly we'd recommend using bcftools mpileup now instead of samtools > mpileup. The samtools vcf output is deprecated and liable to vanish > at some point. Do you still get this problem with bcftools? > > Secondly, for indel calling you'll probably want to adjust IDV and IMF > thresholds. These can be done as hard filters on calling (I think > it's -m and -F flags to bcftools mpileup) or later in post-processing > of the VCF file via a filter. Eg bcftools view -e "IDV < 3 || IMF < 0.03". > (Adding e.g. DP > 60 to filter overdepth - say 1.8x your mean depth - > can also be a good way of removing false variants caused by > copy-number variations or collapsed repeats.) > > Finally, don't give bcftools call such a small region as I'm not sure > it's tuned well for that. Do you get the same problem with say: > > bcftools mpileup -f ref.fa sample.bam -r Chromosome:198800-199200 \| > bcftools call -vm -r Chromosome:198940-198940 - > > Ie the input to bcftools call is a larger region, but the caller > itself is then restricted back down again? > > Sadly though I don't know what is causing this apparent 1bp indel to > be called as a very long one. We'd need to see more data to debug > this. > > James > > -- > James Bonfield (jk...@sa...) > The Sanger Institute, Hinxton, Cambs, CB10 1SA > > > -- > The Wellcome Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > -- Jody Phelan 64 Queen Alexandra Mansions,Hastings St London, WC1H 9DR, UK Tel: +447478609078 Email: jod...@gm...  |