|
From: Heng Li <lh...@sa...> - 2010-11-09 19:13:11
|
Thank you very much for this. I believe I have fixed the bug. Please try the latest revision r804. Please let me know if it still segfaults. Best, Heng On Nov 9, 2010, at 11:35 AM, Josep Ignasi Lucas Lledo wrote: > Heng, > > Just for your information, I'm getting an error when I use more than one > sample. Specifically, if the SM tag of the RG headers are not identical > across input bam files, I get something like this: > > samtools mpileup -gf ref.fa z1.bam z2.bam > z3.bam > > [mpileup] 2 samples in 2 input files > <mpileup> Set max per-sample depth to 4000 > *** glibc detected *** free(): invalid next size (fast): > 0x00000000005c0d50 *** > Aborted > > However, if the SM tags are identical, it works, because it considers > all input files as a unique sample, even if their ID tags are different > in the RG header. That was unexpected, but I guess it makes sense. > > BTW, the -d option does not affect the second line of the standard > error. I'm using the samtools downloaded and compiled yesterday, from > the link that you sent. The 0.1.9 version works well. > > I'm interested in using the best mpileup available, for multiple > samples. I hope you can fix this, or you can let me know if I'm doing > anything wrong. > > thanks > > > > > El dl 08 de 11 de 2010 a les 17:06 -0500, en/na Heng Li va escriure: >> Hi all, >> >> I have refined the pileup's indel caller and reimplemented it in mpileup. The method is essentially the same, but details are handled more carefully. Minor improvements include: >> >> 1. Considering mapping quality. >> 2. Modeling homopolymer errors. >> 3. More careful local realignment. >> 4. Considering error dependency. >> 5. BCF/VCF output. >> 6. Multi-sample indel calling. >> 7. Indicating the uncertainty of indel positions in VCF output >> >> On simulated data, the mpileup caller has better specificity than the old pileup indel caller. However, although the mpileup indel caller considers most important factors in indel calling, it models these factors is a heuristic way. Sophisticated indel callers such as Dindel are probably doing better. In addition, indel calling on real data is much harder than on simulated data. More evaluations are needed to confirm the practical performance of the new caller. That is why I send the email to ask for your helps. >> >> With the reimplementation of the indel caller, mpileup becomes more powerful and more general than pileup, and will deprecate pileup in the long run. I will keep the pileup command for backward compatibility, but it is recommended for users to move to the samtools mpileup/bcftools pipeline. >> >> Heng >> >> >> PS: >> >> To try the new indel caller: >> >> svn co https://samtools.svn.sourceforge.net/svnroot/samtools/trunk/samtools >> cd samtools; make >> cd examples >> ../samtools faidx ex1.fa >> ../samtools mpileup -gf ex1.fa ex1.bam | ../bcftools/bcftools view -vc - >> >> The output is: >> >> ##fileformat=VCFv4.0 >> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ex1.bam >> seq1 288 . A ACATAG 99 . INDEL;AF1=0.500;AFE=0.500;DP4=7,4,9,4;MQ=59;PV4=1,0.31,1,0.063 PL 255,0,255 >> seq1 548 . C A 99 . AF1=0.500;AFE=0.500;DP4=11,8,4,13;MQ=60;PV4=0.049,0.085,1,1 PL 163,0,192 >> seq1 1294 . A G 99 . AF1=0.500;AFE=0.500;DP4=13,6,7,11;MQ=58;PV4=0.1,0.13,1,0.28 PL 175,0,178 >> seq2 156 . AA AAGA 64 . INDEL;AF1=1.000;AFE=0.996;DP4=0,0,8,0;MQ=60 PL 97,24,0 >> seq2 505 . A G 99 . AF1=0.500;AFE=0.500;DP4=13,11,9,14;MQ=60;PV4=0.39,0.29,0.16,1 PL 188,0,198 >> seq2 783 . A AAAAT 99 . INDEL;AF1=0.500;AFE=0.500;DP4=2,7,18,17;MQ=57;PV4=0.15,1,0.093,1 PL 255,0,77 >> seq2 784 . CAATT CAATTAATT 99 . INDEL;AF1=1.000;AFE=1.000;DP4=0,0,20,22;MQ=57 PL 255,126,0 >> seq2 1344 . A C 99 . AF1=0.500;AFE=0.500;DP4=5,9,3,9;MQ=59;PV4=0.68,0.09,0.06,1 PL 136,0,172 >> >> Notably, seq2:783 and seq2:784 correspond to the same indel. One still need to apply a filter (e.g. by calling "../bcftools/vcfutils.pl varFilter") to rule out the weaker call seq2:783. Also note that the 4bp insertion at seq2:784 can be actually inserted to any position after 784-788. The old pileup indel caller does not output this useful information. >> > > > > ------------------------------------------------------------------------------ > The Next 800 Companies to Lead America's Growth: New Video Whitepaper > David G. Thomson, author of the best-selling book "Blueprint to a > Billion" shares his insights and actions to help propel your > business during the next growth cycle. Listen Now! > http://p.sf.net/sfu/SAP-dev2dev > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |