Re: [svtoolkit-help] Filter criteria for SVs
Status: Beta
Brought to you by:
bhandsaker
From: Bob H. <han...@br...> - 2011-09-05 21:54:10
|
First, I should warn you that I haven't done too much work with discovery on targeted sequencing. Your results may vary depending on how even your coverage is across the target region. Filtering good calls from the candidates is still somewhat of an art, and it depends on whether you want more sensitivity or more specificity. There is a default set of filters in the queue scripts is based on the 1000 genomes pilot. The metrics most often used for filtering are the following: DEPTHRATIO / DEPTHPVALUE The depth ratio is the mean read depth for samples with observed aberrant read pairs divided by the mean read depth for the other samples. It should ideally be 0.5 or below for real deletions. If you plot this metric it should be bi-modal and you can select a threshold from the data. The depth p-value indicates whether there was sufficient depth information for depth ratio to be reliable. Default 1kg pilot filters: DEPTHPVALUE < 0.01 and DEPTHRATIO < 0.63 (or DEPTHRATIO < 0.8 if MEMBPVALUE < 0.01) The cutoff of 0.63 was chosen as the approximate midpoint of the bimodal distribution for depth ratio in the 1kg pilot data set. See below for MEMBPVALUE. DEPTHCALLTHRESHOLD < 1 The not-very-well-named depth call threshold is the median normalized sequencing depth of samples with observed aberrant read pairs. A normalized depth of 1 in this case should be approximately copy number 2. Ideally this number would be 0.5 or below and this filter excludes regions of the genome with excessive coverage. COHPVALUE > 0.01 The coherence metric (not a true p-value, despite the name) indicates whether the spacing of the aberrant read pairs are consistent with a single deletion breakpoint. Read pairs generated by mismapping, for example, tend to be more uniformly spaced. MEMBPVALUE This metric tests whether the deletion seems to be appearing more in some samples than others, taking into account uneven sequencing. Lower values are better, but unless you have a lot of samples it can be hard to find a good absolute cutoff for this metric. For the 1kg pilot, what we did was to use this to "boost" some samples with a marginal depth ratio between 0.63 and 0.8. If you are trying to identify high confidence calls, in general longer calls tend to have better depth signal and thus be of higher confidence (all other metrics being equal). In the 1kg phase 1 data set, we also used a filter where we required at least one sample to have two aberrant read pairs. This may be important if your sequencing is low coverage (e.g. 4x) but for higher sequencing depth I think this is not necessary. I would start with the 1kg pilot filters as a guide, as they proved to be reasonable on the 1kg phase 1 data as well (see SVDiscoveryDefaultFilter in SVQScript.q). In theory, it should be possible to calibrate your filters based on existing gold standard data sets, if you have any. Another useful thing to do is to prospectively genotype some of the sites (using Genome STRiP). Sometimes the genotyping results and metrics can be used to help determine whether marginal calls are good or not and this can help influence your discovery thresholds, although I would not recommend trying to do large scale filtering via genotyping. -Bob On 9/5/11 4:56 PM, Hyun Ji Noh wrote: > Hi, > > Now I have vcf files that are generated by modified discovery.sh script. There's huge amount of information in the vcf files and there are even deletion calls that are not in my target region. So I'm wondering what is your recommended criteria to filter high quality deletion calls? > > BW, > Hyun Ji > > P.S. Thank you for you help on mismatched read pair records error. Your advice fixed the problem! > > > ------------------------------------------------------------------------------ > Special Offer -- Download ArcSight Logger for FREE! > Finally, a world-class log management solution at an even better > price-free! And you'll get a free "Love Thy Logs" t-shirt when you > download Logger. Secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsisghtdev2dev > _______________________________________________ > svtoolkit-help mailing list > svt...@li... > https://lists.sourceforge.net/lists/listinfo/svtoolkit-help |