Re: [maq-help] SNP filtering and calling
Status: Beta
Brought to you by:
lh3lh3
|
From: Heng Li <lh...@gm...> - 2007-12-04 21:51:43
|
Hello Olivier, Maq calls genotypes by calculating the posterior probability of genotypes and calling the one that maximize the posterior. It does not explicitly require a minimum frequency or minimum number of reads supporting an allele. By "filters" in my last email, I mean the rules maq uses to discard some poor SNP calls. These rules are applied after an initial set of SNPs are called with the model I briefly described above. I think usually you can trust maq's filtered SNP calls (i.e. "maq cns2snp" + "maq.pl SNPfilter"). Currently it shows good performance on both simulated and large-scale real data. At the same time, if you are not satisfied with maq's accuracy, you may also develop your own method based on the "pileup" output, which is designed for users who want to have their own consensus genotype callers. I am always happy to add new method if it outperforms maq's current model. Many thanks, Heng On 1 Dec 2007, at 02:39, Olivier Harismendy wrote: > Thanks for you answer, however, I am not sure it clears up > everything. My question was also : when do you call a SNP ? what is > the minimum frequency of an alternate base to be considered a true > variant : 0.1 ? 0.2 ? > > Is there a way to play on the coverage for SNP calling or do I have > to parse the pileup output to do that ? > > best, > > Olivier > > > > On Nov 28, 2007, at 1:36 AM, Heng Li wrote: > >> Dear Olivier, >> >> Thanks for the feed back. maq-0.6.2 and maq-0.6.1 use different >> rules in filtering SNPs. In maq-0.6.2, the following rules are >> used to get cns.final.snp: >> >> "After mapping and consensus base calling, we filtered the SNPs >> based on >> four rules: [i') discard SNPs in 3bp-flanking regions beside >> potential >> indels;] i) discard SNPs covered by three or fewer reads; ii) discard >> SNPs covered by no read with a mapping quality higher than 40; >> iii) in >> any 10bp window, if there are 3 or more SNPs, discard them all; >> and iv) >> discard SNPs with quality smaller than 40." >> >> In some cases, you can reduce the threshold on the quality of SNP >> (rule iv), but the false positive may increase a little. For >> single-end reads, you may also try: >> >> maq.pl SNPfilter -a cns.snp > cns.filtered.snp >> >> This command will invoke an alternative filter which is with >> similar aim but slightly different from rule ii). >> >> As for the last column in *.snp, you can ignore them. Even I do >> not use them any longer. Maybe I should remove that column some day. >> >> Hope this helps. >> >> Cheers, >> >> Heng >> >> >> On 27 Nov 2007, at 22:20, Olivier Harismendy wrote: >> >>> Hi, >>> >>> I have question about the cns.final.snp output file from maq.pl >>> easyrun. >>> >>> What are the filters applied to the SNPs (depth/repeats/ >>> quality) ? Is there a way to play with the parameters ? >>> >>> In the output, the last column (quality difference between strong >>> and weak allele) doesn't make sense to me as the values go from >>> -124 to 124. Can you explain a little ? >>> >>> thanks for your answer. >>> >>> Olivier >>> >>> >>> >>> >>> -------------------------------------------------------------------- >>> ----- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2005. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> maq-help mailing list >>> maq...@li... >>> https://lists.sourceforge.net/lists/listinfo/maq-help >> > |