A sample test report below just for illustration. In it we can see MaxOft AD with a p-value close to 1.
Why reject the hypothesis that the generator is random when the p-value for a certain sample is close to 1? Doesn't that mean that the sample was very highly typical assuming the hypothesis? Why does such sample provide reasons to reject the hypothesis that the generator is random?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
-
2020-08-17
Many guides for what to do with real-world statistical test results in the field are based upon the assumption that the test results come from crude chi-squared tests of small datasets that produce horrible p-value quality. In such cases, many p-value ranges end up being flatly impossible or have a very different likelyhood than an idealized model of p-values would suggest.
TestU01 (and many other PRNG testers) on the other hand frequently produces p-values that are far closer to the idealized model. As such, there are much better ways of using them possible.
Generally, any statistical test that produces a p-value has a very suspicious circumstance at p-value zero. Say, all PRNG outputs produced being exactly the same number. Typically, the p-value corresponding to one is less suspicious, but still an extreme anomaly - say, all PRNG outputs occuring at exactly the same frequency as each other even over an extremely large dataset - much more reasonable that all duplicates, but still an absurdly unlikely occurance.
In my experience the most common reason to see p-values near 1 is in the case of single-cycles PRNGs that have used up a very large percentage of their entire cycle during the course of a single test.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
TestU01 simply wants to draw your attention to anything statistically unlikely.
I typically run the same PRNG on anywhere from a few dozen to a few thousand different seeds and perform meta-analysis of the set of results.
In my experience, over hundreds of thousands of results, you should never see a p-value of less than 1.0e-08, or greater than 1 - 1.0e-08, from a good PRNG. If you do, then a single occurance (except perhaps, debatably, outside of perhaps 1.0e-15) is still not noteworthy unless you can repeate the same using different seeds.
I've recently constructed a 31-bit state PRNG that passes small crush using 3 concatenated 10-bit equidistributed output words (given that small crush only looks at the high 30 bits), with both forward and reversed bits ,.. so there is (usually) little chance or good excuse for bumping into the above recommened limits with a modern quality PRNG operating well within its period specification.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
But a p-value close to 1 is not statistically unlikely. Quite the contrary. It is highly statiscally likely. It is telling us that that sample extracted (or one more atypical than it) has almost 100% probabily of ocurring. In other words, there is nothing unsual about that at all. (Or am I reading this whole thing incorrectly?)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
P-values in TestU01 (and PractRand) are non-standard, so 0.5 (on average) is most likely. Values very close to 1 are 'too uniform' (i.e. a potential indication of a 'low-discrepancy sequence' and/or a PRNG run near its full period).
From the TestU01 documentation:
"Classical statistical textbooks usually say that when applying a test of hypothesis,
one must select beforehand a rejection area R whose probability under
H0 equals the target test level (e.g., 0.05 or 0.01), and reject H0 if and only if
Y ∈ R. This procedure might be appropriate when we have a fixed (often small)
sample size, but we think it is not the best approach in the context of RNG testing.
Indeed, when testing RNGs, the sample sizes are huge and can usually be
increased at will. So instead of selecting a test level and a rejection area, we
simply compute and report the p-value of the test, defined as
p = P[Y ≥ y | H0]
where y is the value taken by the test statistic Y . If Y has a continuous distribution,
then p is a U(0, 1) random variable under H0. For certain tests, this p
can be viewed as a measure of uniformity, in the sense that it will be close to 1
if the generator produces its values with excessive uniformity, and close to 0 in
the opposite situation..."
Last edit: Cerian Knight 2020-08-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for the info. Which documentation is that? I'm searching through User’s guide detailed version, User’s guide compact version, MyLib-C, ProbDist. I can't find this quote on any of them.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A sample test report below just for illustration. In it we can see MaxOft AD with a p-value close to 1.
Why reject the hypothesis that the generator is random when the p-value for a certain sample is close to 1? Doesn't that mean that the sample was very highly typical assuming the hypothesis? Why does such sample provide reasons to reject the hypothesis that the generator is random?
Many guides for what to do with real-world statistical test results in the field are based upon the assumption that the test results come from crude chi-squared tests of small datasets that produce horrible p-value quality. In such cases, many p-value ranges end up being flatly impossible or have a very different likelyhood than an idealized model of p-values would suggest.
TestU01 (and many other PRNG testers) on the other hand frequently produces p-values that are far closer to the idealized model. As such, there are much better ways of using them possible.
Generally, any statistical test that produces a p-value has a very suspicious circumstance at p-value zero. Say, all PRNG outputs produced being exactly the same number. Typically, the p-value corresponding to one is less suspicious, but still an extreme anomaly - say, all PRNG outputs occuring at exactly the same frequency as each other even over an extremely large dataset - much more reasonable that all duplicates, but still an absurdly unlikely occurance.
In my experience the most common reason to see p-values near 1 is in the case of single-cycles PRNGs that have used up a very large percentage of their entire cycle during the course of a single test.
This makes sense. Thanks for the explanation.
TestU01 simply wants to draw your attention to anything statistically unlikely.
I typically run the same PRNG on anywhere from a few dozen to a few thousand different seeds and perform meta-analysis of the set of results.
In my experience, over hundreds of thousands of results, you should never see a p-value of less than 1.0e-08, or greater than 1 - 1.0e-08, from a good PRNG. If you do, then a single occurance (except perhaps, debatably, outside of perhaps 1.0e-15) is still not noteworthy unless you can repeate the same using different seeds.
I've recently constructed a 31-bit state PRNG that passes small crush using 3 concatenated 10-bit equidistributed output words (given that small crush only looks at the high 30 bits), with both forward and reversed bits ,.. so there is (usually) little chance or good excuse for bumping into the above recommened limits with a modern quality PRNG operating well within its period specification.
But a p-value close to 1 is not statistically unlikely. Quite the contrary. It is highly statiscally likely. It is telling us that that sample extracted (or one more atypical than it) has almost 100% probabily of ocurring. In other words, there is nothing unsual about that at all. (Or am I reading this whole thing incorrectly?)
P-values in TestU01 (and PractRand) are non-standard, so 0.5 (on average) is most likely. Values very close to 1 are 'too uniform' (i.e. a potential indication of a 'low-discrepancy sequence' and/or a PRNG run near its full period).
From the TestU01 documentation:
"Classical statistical textbooks usually say that when applying a test of hypothesis,
one must select beforehand a rejection area R whose probability under
H0 equals the target test level (e.g., 0.05 or 0.01), and reject H0 if and only if
Y ∈ R. This procedure might be appropriate when we have a fixed (often small)
sample size, but we think it is not the best approach in the context of RNG testing.
Indeed, when testing RNGs, the sample sizes are huge and can usually be
increased at will. So instead of selecting a test level and a rejection area, we
simply compute and report the p-value of the test, defined as
p = P[Y ≥ y | H0]
where y is the value taken by the test statistic Y . If Y has a continuous distribution,
then p is a U(0, 1) random variable under H0. For certain tests, this p
can be viewed as a measure of uniformity, in the sense that it will be close to 1
if the generator produces its values with excessive uniformity, and close to 0 in
the opposite situation..."
Last edit: Cerian Knight 2020-08-11
Thank you for the info. Which documentation is that? I'm searching through User’s guide detailed version, User’s guide compact version, MyLib-C, ProbDist. I can't find this quote on any of them.
TestU01: A C Library for Empirical Testing of Random Number Generators
Page 5
Thanks very much! I was able to locate it.