Kolmogorov-Smirnov wrong P value in version 0.3.7
Java library of statistical distribution
Brought to you by:
robbyjo
Hi, I first wanted to thank you for the quick reply, and version update regarding my question in Bug#19.
However, i tested your code in version 0.3.7 and i still have some problems with the p-values it calculates.
Attached are two vectors (which were create in R according to the randomly generated vectors I created in Java), and the code line i used.
The result i get from your package in the exact case is extremely different than what i receive in R:
JDistLib: pvalue = 7.666669345840482E-5
R: pvalue = 0.004296
In older version i don't seem to get these conflicts.
Could you please look if any changes have made.
Thanks,
Gilad Wallach & Eran Avidan
Anonymous
I have additional information regarding the big I found.
It seems like the problem happens whenever the D statistic is larger than 1 (which is an illegal value).
This usually occurs when one of the data vectors is small (seeexample attachment "SmallDataBug").
For example in the attached dataset the result is:
JDistLib: statistic = 1.05, pvalue = 2.22e-16
R: statistic = 0.75, pvalue = 0.1428
From what i remember older versions didn't have this bug, therefore it might be something you've added in recent versions.
I hope the information may help you solve this bug. I'd be happy to help in any other way possible.
Thanks,
Gilad
Last edit: Gilad 2014-12-15
Thank you so much for the bug report. This bug is caused by integer division. Also, there is an off-by-one due to my refactoring of the order function a while back. Apologies for the inaccuracies. It is now fixed in v0.3.8. All D and p-values should be exactly the same with R's results now.
Just to add: This bug was introduced in v0.3.6, with the fix of bug #18.
Thanks for your reply.
I will download 0.3.8 and hope to enjoy the great implementations.
Gilad Wallach & Eran Avidan