The commons-math3 library has a lot of useful stats (and other) stuff in it. I've locally implemented the t-test and SignTest using the distribution.TDistribution and stat.inference.BinomialTest classes inside of Stat.java. The library allows other implementations as well.
Still some uncertainty about testing focus here. The t-test is two tailed and the SignTest one tailed (upper) test. They should probably both be one tailed (is treatment better than baseline), but perhaps just "yes, they're different" is OK too. Also some issues with proper degress of freedom to be used.
I'm thinking of committing the changes anyways, since any galago eval using baseline and treatment arguments is going to throw an exception and break. But then maybe breakage is better than a not quite correct result.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Implemented directly via Apache commons-math3 library.
Issues remain with boosted Sign Test where booting process somehow results in numbers of improved samples being greater than the number of different samples.
Needs more work.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can you use http://commons.apache.org/proper/commons-math/userguide/special.html to address the pieces that we need to implement?
The commons-math3 library has a lot of useful stats (and other) stuff in it. I've locally implemented the t-test and SignTest using the distribution.TDistribution and stat.inference.BinomialTest classes inside of Stat.java. The library allows other implementations as well.
Still some uncertainty about testing focus here. The t-test is two tailed and the SignTest one tailed (upper) test. They should probably both be one tailed (is treatment better than baseline), but perhaps just "yes, they're different" is OK too. Also some issues with proper degress of freedom to be used.
I'm thinking of committing the changes anyways, since any galago eval using baseline and treatment arguments is going to throw an exception and break. But then maybe breakage is better than a not quite correct result.
Implemented directly via Apache commons-math3 library.
Issues remain with boosted Sign Test where booting process somehow results in numbers of improved samples being greater than the number of different samples.
Needs more work.