Sterling approximation should have been Stirling’s approximation…. but you get the point….

John Braisted
Senior Software Engineer
Pathogen Functional Genomics Resource Center (PFGRC)
J. Craig Venter Institute
9704 Medical Center Drive
Rockville, MD 20850

From: Braisted, John C.
Sent: Thursday, June 18, 2009 11:23 AM
To: 'mev-tm4-devel@lists.sourceforge.net'; mev; Mev Harvard
Subject: Committed NonparHypergeometricProbability.java to the trunk area of SF SVN

Hi Eleanor et al.,

I’ve committed this class with a small but important patch.  This class uses a Sterling approximation for large factorials when computing the Fisher Exact probability for a 2x2 contingency matrix.  NIAID and their group provided this code to support EASE.  The code is originally from Bell Labs.

The Nonpar Fisher exact provides a one and a two tailed probability.  For the two tailed part, the code iterates over matrices that are equal or less extreme than the observed and sums probabilities that are less than or the ‘same’ as the original matrix.  All computations are done as double precision numbers.  The problem comes when you encounter/consider the transpose of the originally observed matrix when considering the other tail of the probability.  The approximation for p-value should be exactly the same but it varies out at the very end.  I’m now just casting it to (float) to make the probability comparison so that this discrepancy (last digit of a double precision value) doesn’t affect the result.  There may be other ways to handle this like recognizing the transposed matrix corresponding to the original matrix and adding it’s probability regardless of the fact that it’s just a bit larger due to the approximation error at the limits of machine probability.

Note that in many cases the transpose of the original matrix behaves and is the same but I have found instances where this fails.

Here’s the code that has been committed:

Note that EASE doesn’t care about/compute the two tailed test since we only care about over representation in the cluster (one sided).  Therefore, the FE in EASE doesn’t have the two-tailed test and doesn’t need this patch.

John

John Braisted
Senior Software Engineer
Pathogen Functional Genomics Resource Center (PFGRC)
J. Craig Venter Institute
9704 Medical Center Drive
Rockville, MD 20850