Sterling approximation should
have been Stirling’s approximation…. but you get the point….

John
Braisted

Senior Software Engineer

Pathogen Functional Genomics Resource Center (PFGRC)

J. Craig Venter Institute

9704 Medical Center Drive

Rockville, MD 20850

**From:** Braisted, John C.

**Sent:** Thursday, June 18, 2009 11:23 AM

**To:** 'mev-tm4-devel@lists.sourceforge.net'; mev; Mev Harvard

**Subject:** Committed NonparHypergeometricProbability.java to the trunk
area of SF SVN

Hi Eleanor et al.,

I’ve committed this class with a small but important
patch. This class uses a Sterling approximation for large factorials when
computing the Fisher Exact probability for a 2x2 contingency matrix.
NIAID and their group provided this code to support EASE. The code is
originally from Bell Labs.

The Nonpar Fisher exact provides a one and a two tailed
probability. For the two tailed part, the code iterates over matrices
that are equal or less extreme than the observed and sums probabilities that
are less than or the ‘same’ as the original matrix. All computations are
done as double precision numbers. The problem comes when you encounter/consider
the transpose of the originally observed matrix when considering the other tail
of the probability. The approximation for p-value should be exactly the
same but it varies out at the very end. I’m now just casting it to
(float) to make the probability comparison so that this discrepancy (last digit
of a double precision value) doesn’t affect the result. There may be
other ways to handle this like recognizing the transposed matrix corresponding
to the original matrix and adding it’s probability regardless of the fact that
it’s just a bit larger due to the approximation error at the limits of machine
probability.

Note that in many cases the transpose of the original matrix
behaves and is the same but I have found instances where this fails.

Here’s the code that has been committed:

Note that EASE doesn’t care about/compute the two tailed
test since we only care about over representation in the cluster (one sided).
Therefore, the FE in EASE doesn’t have the two-tailed test and doesn’t need
this patch.

John

John Braisted

Senior Software Engineer

Pathogen Functional Genomics Resource Center (PFGRC)

J. Craig Venter Institute

9704 Medical Center Drive

Rockville, MD 20850