From: David N. <Dav...@hc...> - 2012-03-02 21:30:22
|
Oh the headaches..... DESeq is being very strict in calling variance outliers when analyzing multiple replica RNA-Seq data. So much so that with human patient datasets many genes that are clearly differentially expressed (DE) are being dropped. This feature of DESeq can be disabled using the sharingMode='fit-only' when estimating dispersions unfortunately this leads to many false positives in the DE lists. The best solution I've been able to concoct so far is to run DESeq with and without the outlier filtering, present both sets of pvalues and FDRs and then provide (by default) the smallest of the all pair log2 ratio DE metrics. This smallest log2 ratio is a worst case estimate of DE. Thus to get a starting list of DE genes to work with from a multiple replica RNA-Seq experiment, filter for BH_FDR_NoVarOutFilt values > 10 (10% FDR) and Log2Ratio's of < or > 1. If the gene's BH_FDR_VarOutFilt also passes threshold great. These problems of over aggressive outlier filtering may only apply to human patient samples where the biological variation is quite large do to other confounding issues with each patient. Thanks to Don Delker and Curt Hagedorn for pointing out these particular issues. -cheers, D |