From: Heng Li <lh...@sa...> - 2010-10-13 14:08:09
|
I seldom send email to samtools-announce, but I think it is worth it this time. Samtools now calculates a per-base alignment quality (BAQ) which directly measures the probability of a read base (not the entire read, so different from mapping quality) being wrongly aligned. It is designed to resolve false SNPs caused by misalignment, especially around indels and in low-complexity regions. Application to data from the 1000 genomes project has confirmed its effectiveness. The theory is in http://lh3lh3.users.sourceforge.net/download/samtools.pdf. The new feature is in SVN only. To try it: svn co https://samtools.svn.sourceforge.net/svnroot/samtools/trunk/samtools cd samtools; make; cd examples; ../samtools faidx ex1.fa ../samtools calmd -ur ex1.bam ex1.fa 2>/dev/null |../samtools pileup -vcf ex1.fa - Samtools will call 4 SNPs which are all correct. In contrast, if we run: ../samtools pileup -vcf ex1.fa ex1.bam Samtools will call 11 SNPs. All the 7 false SNPs are around indels. Although in this example they can be filtered out by "samtools.pl varFilter", there are cases when BAQ provides additional power. Especially for ungapped aligner, invoking BAQ (via calmd -r) is *essential* to reliable SNP calling because the indel filter will not be available. A similar strategy has been implemented in stampy (http://www.well.ox.ac.uk/project-stampy), although I was not aware of that when I started to implement BAQ in samtools. The model behind probably differs, too. Heng -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Ashish S. <as...@st...> - 2010-10-14 06:44:47
|
Hi Heng, Sorry if the question sounds too basic. With reference to the theory doc, the S state in the profile HMM wasn't clear to me especially the part where P(^|S) = P($|S) = 1 combined with the fact that there is a non zero transition probability from state S to match and insert states. Unless of course this HMM represents generation of infinite number of reads and not just one read. Even still the HMM will need to emit two symbols $ followed by ^ when in state S. Am I missing something ? Thanks and regards, - Ashish To: sam...@li..., "Samtools-help" <samtools- he...@li...> I seldom send email to samtools-announce, but I think it is worth it this time. Samtools now calculates a per-base alignment quality (BAQ) which directly measures the probability of a read base (not the entire read, so different from mapping quality) being wrongly aligned. It is designed to resolve false SNPs caused by misalignment, especially around indels and in low-complexity regions. Application to data from the 1000 genomes project has confirmed its effectiveness. The theory is in http://lh3lh3.users.sourceforge.net/download/samtools.pdf. The new feature is in SVN only. To try it: svn co https://samtools.svn.sourceforge.net/svnroot/samtools/trunk/samtools cd samtools; make; cd examples; ../samtools faidx ex1.fa ../samtools calmd -ur ex1.bam ex1.fa 2>/dev/null |../samtools pileup -vcf ex1.fa - Samtools will call 4 SNPs which are all correct. In contrast, if we run: ../samtools pileup -vcf ex1.fa ex1.bam Samtools will call 11 SNPs. All the 7 false SNPs are around indels. Although in this example they can be filtered out by "samtools.pl varFilter", there are cases when BAQ provides additional power. Especially for ungapped aligner, invoking BAQ (via calmd -r) is *essential* to reliable SNP calling because the indel filter will not be available. A similar strategy has been implemented in stampy (http://www.well.ox.ac.uk/project-stampy), although I was not aware of that when I started to implement BAQ in samtools. The model behind probably differs, too. Heng |