Menu

#17 Different results when 1 byte is skipped

v1.0_(example)
open
nobody
None
5
2023-06-09
2023-06-02
No

I have a 1GB test file that passes v0.95 with plain 'stdin' option. When I remove the first byte of the file, and keep everything else the same, it starts failing at 32 MB.

files:
https://mega.nz/file/6ExT3RTK#yqXZlVltWSUSdoNNZF4TNWTJ0ECIcmdSziDufruNwes
https://mega.nz/file/eMZk0TYZ#97232MaGg3m1No8Ktcxbchc09VsMlyK1zstHrzBB76o

Discussion

  • Arlet Ottens

    Arlet Ottens - 2023-06-02

    Both files fail SmallCrush:
    ~~~
    ========= Summary results of SmallCrush =========

    Version: TestU01 1.2.3
    Generator: stdin
    Number of statistics: 15
    Total CPU time: 00:00:06.00
    The following tests gave p-values outside [0.001, 0.9990]:
    (eps means a value < 1.0e-300):
    (eps1 means a value < 1.0e-15):

       Test                          p-value
    

    1 BirthdaySpacings eps


    All other tests were passed

     
  • - 2023-06-02

    Quirky, but it's looking like correct behavior to me so far. I think the third nibble out of each 64 bits has some kind of detectable pattern, and removing the first byte phase shifts that to the first nibble out of each 64 bits, which is much easier for PractRand to notice.

     
  • Arlet Ottens

    Arlet Ottens - 2023-06-04

    Possibly related: I noticed this issue while working on a project to create a good PRNG for 8 bit targets, using a genetic/hill climbing algorithm that uses PractRand to test candidates. I've seen it happen quite regularly that PractRand gives a pass for 128GB or even 1TB, but that the same PRNG output fails in SmallCrush/Crush, usually on the BirthdaySpacings test. The same thing happened with the output files above.

    The contrast between PractRand running for an hour or more and seeing no problems and SmallCrush failing with "eps" in less than 1 second seems a bit harsh to me. Maybe related to the 8 bit nature of my PRNG, where possibly weak bits are in the middle of a 64 bit word? Maybe something else?

    As an example I have attached currently running test. PractRand has just accepted 256GB without anomalies, and is still going. SmallCrush fails with p-value of 5.0e-27

     
  • - 2023-06-04

    It is rather embarrassing that SmallCrush is detecting flaws that PractRand needs hours or more on. Perhaps I should have a small variant of the birthday spacings test incorporated in to the base test set... I presume it's all birthday spacings tests it's failing?

    Let's see... for this variation at least, the PRNG state transition function looks like an 48 bit LCG with additive constant 69, multiplicative constant 0x010101010101, and modulus 0x1000000000000, plus an 8-bit accumulator used in a rather odd way. Then the output function is a complex hash down to 32 bits of output via a bunch of add-with-carry operations and at least one barrel shift (arguably 3 or even more) in a non-standard structure that is difficult to summarize or analyze.

    Overall it ends up behaving in testing a bit like a combination of a bit-oriented PRNG (which many of my tests do poorly on) and a PRNG that obfuscates the low bit of integer math (which my folding approach performs poorly on).

    In the short term you could try enabling the extended test set (-te 1) or modifying the normal test set for your purpose... in test_batteries.cpp, the second function should be named get_core_tests(), about the sixth line of which should be a constructor "new Tests::FPF(4, 14, 6)" - changing the first parameter to 2 (or maybe 3?) should slow the normal test set down slightly, but make it perform better on PRNGs based around sub-byte operations. I think that will help the cases you are looking at a bit. Might still underperform anything that throws actual birthday spacings tests at it though.

     
    • Arlet Ottens

      Arlet Ottens - 2023-06-05

      I've seen some failures reported on other tests (I remember MaxOft) as well, but with p-values around 1e-4, so not really definitive. The birthday spacings is the one that fails hard. I'll keep an eye on other results.

      In my code, the 48 bit LCG portion is fixed, and the 32 bit hash sequence is generated by an algorithm that tries out random sequences of candidate instructions . Each generator is then tested by calling PractRand, which I have modified to return a simple quality score. The set of candidate instructions and overall search space is guided by hand based on what works best for the specific target architecture (right now the Arduino microcontroller). Of course, because this only uses PractRand for screening, the risk is that it finds specific weaknesses in the test coverage.

      By the way, I've build RNG_test according to your build instructions on Linux, using gcc 11.3.0,

       
      • - 2023-06-09

        I tried some tests on the code you posted.

        TestU01: 1/1/1
        That is, it fails 1 subtest (out of 15) of SmallCrush, 1 (out of 144) of Crush, and 1 (out of 160) of BigCrush. So about 10-15 seconds for first failure. In every case, it was a birthday spacings test that found bias. Also, some people like to run TestU01 on bit-backwards versions of PRNGs (because TestU01 is less stringent on the higher bits, completely skipping the highest IIRC) I also tried both bit-backwards and byte-backwards ; bit-backwards passed everything, while byte-backwards faired slightly worse than the original at 1/2/2 - though it was still only ever birthday spacings tests that detected bias.

        PractRand 0.95 default parameters: 512 GB (mod3n)
        While first failure is in mod3n at 512 GB, at 2 TB also fails a low-bit variant of TMFn, and at 4 TB it also fails both BCFN and DC6. So, two hours of testing for the first failure, at least when running single-threaded.

        gjrand: 0/0/0/~1/1/?
        It passes gjrand at tiny, small, and standard, gave very suspicious results on one subtest at big (what it called a grade 6 failure... PractRand would call that "very suspicious" without capitalization), fully fails that subtest again at huge and starts looking suspicious on a second, and I don't have results yet at tera (the highest setting). gjrand has 13 subtests regardless of size/setting. IIRC that should be about 5 minutes for first failure if you count the borderline one, 45 minutes if you don't... under normal circumstances, but I only have this running in linux and my linux VM is running slow these days so it was about twice that for me.

        So, overall it looks like SmallCrush is very lucky to spot that. Still, it does spot it, and fast... I'll add that and maybe some variations to the set of PRNGs I test against.

         
  • - 2023-06-04

    No, wait, I'm confused. I hadn't touched PractRand in a while and was accidentally testing with version 0.92. Testing with version 0.95, it does adequately on those files at default settings.

    rng=RNG_file(d:\inc\out), seed=0
    length= 1 gigabyte (2^30 bytes), time= 24.0 seconds
    Test Name Raw Processed Evaluation
    mod3n(5):(0,9-0) R= +74.0 p = 1.9e-40 FAIL !!!
    ...and 222 test result(s) without anomalies

    rng=RNG_file(d:\inc\out1), seed=0
    length= 32 megabytes (2^25 bytes), time= 4.6 seconds
    Test Name Raw Processed Evaluation
    TMFn(2+0):wl R= +69.0 p~= 1e-34 FAIL !!!
    ...and 154 test result(s) without anomalies

    ...and the algorithm you posted code for fails so fast that I must have screwed something up when pasting it in to PractRand, I'll take another stab at that one later.

    edit: Note that mod3n and TMFn are weak tests arranged to use trivial amounts of resources to check for weird cases that manage slip between the more powerful main tests. Same for the BRank test, but that one isn't relevant since it couldn't find any bias here. That is, the main tests (FPF, BCFN, Gap16, and DistC6) use a lot of resources to look for broad classes of bias, while the minor tests (BRank, mod3n, and TMFn) use tiny amounts of resources to check for very narrowly defined types of bias known to slip by the main tests.

     

    Last edit: 2023-06-04

Log in to post a comment.