Re: [svtoolkit-help] RedundancyAnnotator
Status: Beta
Brought to you by:
bhandsaker
From: Bob H. <han...@br...> - 2017-10-03 12:59:28
|
Hi, Thomas, The RedundancyAnnotator is designed to work with VCFs generated from multiple tools (it was originally developed to help merge SVs from multiple callers are part of the 1000 Genomes project). It does require consistent annotations (e.g. all input files have genotype likelihoods) and the quality of the results will depend on how well-calibrated the likelihoods are. I haven't used it with lumpy, so your mileage may vary. There are a couple of different approaches/modes you can use. The defaults are more designed for multiple calling methods on the same samples, which is what it sounds like you have. The tool compares all "nearby" variants pairwise, where "nearby" is usually determined by degree of overlap (default is 50% I believe, but you can change this). For the pairwise comparison, the default mode calculates that likelihood that any sample is more likely to have a different genotype than the same genotype, basically on-diagonal vs. off-diagonal (again, you can adjust the threshold). If no samples have sufficiently confidently different genotypes, then the two variants are deemed redundant. In this case, we want to filter one of the two redundant variants, and this is done by setting a filter on the variant with the smallest posterior genotype likelihoods (least confident genotype calls). The method attempts to compute a stable dominance order so that if there are multiple overlapping calls the minimal set is removed. The default settings tend to produce rather "light" filtering, erring on the side of leaving calls unfiltered and only filtering those that are confidently quite similar. This may be appropriate for an association study, where you don't mind a few extra tests. If you are trying to create a reference map, you may get better results by turning up the thresholds. You can evaluate, for example, by looking at how many overlapping calls remain and the degree of overlap. If you turn the thresholds high enough, you can force the output to have no overlapping variants. As an aside, the other application we use this for is to combine calls across disjoint sets of samples. In this case, we need more aggressive merging, so we change the settings to ignore the genotype likelihoods in the pairwise comparisons and just use the hard genotype calls and set a threshold on the allowable number of genotype discordances. -Bob On 10/3/17 8:09 AM, Thomas Faraut wrote: > Dear GenomeSTRIP team, > > We used successfully genomeSTRIP to detect medium to large deletions > in goats. > For smaller deletions, we use another variant detection tool (lumpy) but > would like to be able to use the RedundancyAnnotator from the > svtoolkit to > detect duplicate calls. > > Is it possible to use the RedundancyAnnotator with a vcf file provided by > another SV genotyping tool provided that the genotype likelihoods are > available ? > Or is this redundancy score calculation described in one of the > genomeSTRIP > paper ? > > Thank you in advance for your help. > > Best regards, > Thomas Faraut > |