Re: [svtoolkit-help] Questions regarding the output
Status: Beta
Brought to you by:
bhandsaker
From: Bob H. <han...@br...> - 2012-08-06 11:44:12
|
On 8/6/12 12:26 AM, Jaemin Kim wrote: > Hi, I've been asking quite a few questions about GenomeSTRiP and > answers have been very helpful. > > I have two additional questions: > > 1. When you actually count the number of deletion sites in your paper, > did you count the ones with "PASS" quality only? Yes. There are a set of default filters applied if you use the standard SVDiscovery Q script. These may or may not be optimal for different data sets, and I would encourage you to look at the distributions of the various metrics. The default filters weren't too bad when applied out-of-the-box to 1000G Phase 1, although that is similar 4x data. In the larger data set, the default filters had an estimated false discovery rate of about 8%. We used two additional filters for Phase 1 to get the estimated FDR down to about 3-4%, most importantly a filter that mostly removed rare events: GSNPAIRS/GSNSAMPLES > 1.1 For any low-coverage sequencing of several hundred samples or more this filter is likely important. We estimated the FDR of sites where there were two supporting read pairs in two different samples at around 25%. Of lesser importance, but still helpful, we also removed any sites that were > 90% alpha satellite repeat (as called by repeatmasker). > 2. How did you calculate the deletion length? Did you use the GSCOORDS > information (and take the subtraction between biggest and smallest > coordinates)? No. I use either INFO:END - POS or INFO:SVLEN. For Genome STRiP I think these are always the same. These are supposed to be the "most likely" start/end/length of the deletion. On top of these, you can use INFO:CIPOS and INFO:CIEND which are approximately 95% confidence intervals on POS and END respectively. GSCOORDS is somewhat different - it is the inner/outer extent of the aberrantly spaced read pairs and is mostly used for internal bookkeeping. The outer coordinates will likely over-estimate the true position and the inner coordinates will similarly under-estimate. -Bob > > > Thanks for your kindness. > > > Regards, > > Jaemin Kim > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > _______________________________________________ > svtoolkit-help mailing list > svt...@li... > https://lists.sourceforge.net/lists/listinfo/svtoolkit-help |