Menu

Why does TestU01 reject p-values close to 1?

2020-08-09
2020-08-17
  • Wayne Harris

    Wayne Harris - 2020-08-09

    A sample test report below just for illustration. In it we can see MaxOft AD with a p-value close to 1.

    Why reject the hypothesis that the generator is random when the p-value for a certain sample is close to 1? Doesn't that mean that the sample was very highly typical assuming the hypothesis? Why does such sample provide reasons to reject the hypothesis that the generator is random?

    ========= Summary results of SmallCrush =========
    
     Version:          TestU01 1.2.3
     Generator:        xorshift
     Number of statistics:  15
     Total CPU time:   00:00:10.71
     The following tests gave p-values outside [0.001, 0.9990]:
     (eps  means a value < 1.0e-300):
     (eps1 means a value < 1.0e-015):
    
           Test                          p-value
     ----------------------------------------------
      1  BirthdaySpacings                 eps
      2  Collision                        eps
      6  MaxOft                           eps
      6  MaxOft AD                      1 -  3.6e-06
      7  WeightDistrib                  9.1e-15
      8  MatrixRank                       eps
     10  RandomWalk1 H                    eps
     10  RandomWalk1 M                  3.3e-16
     10  RandomWalk1 J                  1.4e-15
     ----------------------------------------------
     All other tests were passed
    
     
    • - 2020-08-17

      Many guides for what to do with real-world statistical test results in the field are based upon the assumption that the test results come from crude chi-squared tests of small datasets that produce horrible p-value quality. In such cases, many p-value ranges end up being flatly impossible or have a very different likelyhood than an idealized model of p-values would suggest.

      TestU01 (and many other PRNG testers) on the other hand frequently produces p-values that are far closer to the idealized model. As such, there are much better ways of using them possible.

      Generally, any statistical test that produces a p-value has a very suspicious circumstance at p-value zero. Say, all PRNG outputs produced being exactly the same number. Typically, the p-value corresponding to one is less suspicious, but still an extreme anomaly - say, all PRNG outputs occuring at exactly the same frequency as each other even over an extremely large dataset - much more reasonable that all duplicates, but still an absurdly unlikely occurance.

      In my experience the most common reason to see p-values near 1 is in the case of single-cycles PRNGs that have used up a very large percentage of their entire cycle during the course of a single test.

       
      • Wayne Harris

        Wayne Harris - 2020-08-17

        This makes sense. Thanks for the explanation.

         
  • Cerian Knight

    Cerian Knight - 2020-08-11

    TestU01 simply wants to draw your attention to anything statistically unlikely.
    I typically run the same PRNG on anywhere from a few dozen to a few thousand different seeds and perform meta-analysis of the set of results.
    In my experience, over hundreds of thousands of results, you should never see a p-value of less than 1.0e-08, or greater than 1 - 1.0e-08, from a good PRNG. If you do, then a single occurance (except perhaps, debatably, outside of perhaps 1.0e-15) is still not noteworthy unless you can repeate the same using different seeds.
    I've recently constructed a 31-bit state PRNG that passes small crush using 3 concatenated 10-bit equidistributed output words (given that small crush only looks at the high 30 bits), with both forward and reversed bits ,.. so there is (usually) little chance or good excuse for bumping into the above recommened limits with a modern quality PRNG operating well within its period specification.

     
  • Wayne Harris

    Wayne Harris - 2020-08-11

    But a p-value close to 1 is not statistically unlikely. Quite the contrary. It is highly statiscally likely. It is telling us that that sample extracted (or one more atypical than it) has almost 100% probabily of ocurring. In other words, there is nothing unsual about that at all. (Or am I reading this whole thing incorrectly?)

     
    • Cerian Knight

      Cerian Knight - 2020-08-11

      P-values in TestU01 (and PractRand) are non-standard, so 0.5 (on average) is most likely. Values very close to 1 are 'too uniform' (i.e. a potential indication of a 'low-discrepancy sequence' and/or a PRNG run near its full period).

      From the TestU01 documentation:

      "Classical statistical textbooks usually say that when applying a test of hypothesis,
      one must select beforehand a rejection area R whose probability under
      H0 equals the target test level (e.g., 0.05 or 0.01), and reject H0 if and only if
      Y ∈ R. This procedure might be appropriate when we have a fixed (often small)
      sample size, but we think it is not the best approach in the context of RNG testing.
      Indeed, when testing RNGs, the sample sizes are huge and can usually be
      increased at will. So instead of selecting a test level and a rejection area, we
      simply compute and report the p-value of the test, defined as
      p = P[Y ≥ y | H0]
      where y is the value taken by the test statistic Y . If Y has a continuous distribution,
      then p is a U(0, 1) random variable under H0. For certain tests, this p
      can be viewed as a measure of uniformity, in the sense that it will be close to 1
      if the generator produces its values with excessive uniformity, and close to 0 in
      the opposite situation..."

       

      Last edit: Cerian Knight 2020-08-11
  • Wayne Harris

    Wayne Harris - 2020-08-12

    Thank you for the info. Which documentation is that? I'm searching through User’s guide detailed version, User’s guide compact version, MyLib-C, ProbDist. I can't find this quote on any of them.

     
    • Cerian Knight

      Cerian Knight - 2020-08-12
       
      • Wayne Harris

        Wayne Harris - 2020-08-17

        Thanks very much! I was able to locate it.

         

Log in to post a comment.