Practically Random / Discussion / Open Discussion: Japanese paper on xorshift128+

Lorenzo - 2019-10-03

I thought people on the forum might be interested in knowing that recently the Japanese authors of the Mersenne Twister have found that a simple ‘zooming in’ on the xorshift128+ RNG reveals non-random distributions, and have passed a damning judgement on this generator:
Hiroshi Haramoto, Makoto Matsumoto, Again, random numbers fall mainly in the planes: xorshift128+ generators
https://arxiv.org/abs/1908.10020

Hiroshi Haramoto, Makoto Matsumoto, Mutsuo Saito, Pseudo random number generators: attention for a newly proposed generator
https://arxiv.org/abs/1907.03251

BTW, the xorshift128+ generator is employed as the standard RNG in several Javascript engines, including (since about 2016) in the V8 engine used in Google Chrome. Also, according to the generator's author (S Vigna) the xorshift family of generators has been superseeded by the 'xoshiro' and 'xoroshiro' families.

More relevantly to PractRang, the Japanese authors stress that one should not rely on testing suites such as TestU01 to establish the quality of new generators, because the suites are primarily made to reveal faults in already existing PNG. It is too easy to take a mediocre generator which fails TestU01, tweak it a bit and make it pass it, but in so doing deviations from randomness are probably just moved somewhere where the test suite is not looking, but not really eliminated. This is, I think what they are saying. What do you think?

Personally I see little reason for not using a strong, cryptographically secure RNG, such as AES in counter mode (with hardware support), ChaCha8, ISAAC, SPECK etc. These are quite fast, parallelizable don't use that much memory and have been scrutinised much, much more thoroughly than any ‘ordinary’ RNG, whose randomness is only skin-deep.
If in my applications the speed of random number generation is not the bottleneck, why not use a really good one? And if my application uses so very many random numbers that the generator is the bottleneck, why risk to spoil the results by using a generator of dubious quality? What do you think?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

- 2019-10-06

I thought people on the forum might be interested in knowing that recently the Japanese authors of the Mersenne Twister have found that a simple ‘zooming in’ on the xorshift128+ RNG reveals non-random distributions, and have passed a damning judgement on this generator:
Hiroshi Haramoto, Makoto Matsumoto, Again, random numbers fall mainly in the planes: xorshift128+ generators

He zoomed in... 22 bits? On a PRNG with 128 bits of state? Yeah, that degree of pattern does seem a bit excessive I guess. Still, that seems like about what you expect from an algorithm like xorshift128+ that's little more than an xorshift. Not that I feel like mt19937 authors have a lot of room to talk on the subject, throwing so much memory at the problem while keeping everything perfectly linear.

Hiroshi Haramoto, Makoto Matsumoto, Mutsuo Saito, Pseudo random number generators: attention for a newly proposed generator

I'm a bit confused by this. It seems to be a recent... paper(?) calling attention to their other paper disparaging xorshift128+ and to an old PRNG of theirs named tinyMT. But it doesn't properly define tinyMT (there's some sort of logic diagram for it, but several things in that look ambiguous to me). There's a link to a webpage for it, which claims the latest version is 1.1.1 and recommends getting it from github, which in turn seems to attach a version number of 1.0.2 to the actual tinyMT code. Well, whatever, I'm probably misunderstanding something. The actual algorithm... that's... an xorshift with effectively a tiny bit of indirection mixed in? Which sounds good, but in actuality they've probably done it up in such a way as to keep everything perfectly linear, like they did with mt19937, otherwise they wouldn't claim it's single-cycle. Which will probably undo all the good such could otherwise do. Implementing... wait where do you get the parameters from, unless all values are valid? No, most parameter values are considered bad. Found a list of good parameter values online, at least there are lots of them. Testing... yep, it's all perfectly linear, fails BRank rapidly. Still, it doesn't fail anything else. It wouldn't surprise me if this actually produced better output than mt19937. Testing with bad values... yeah, quality varies significantly with parameter values. Lowbit BRank fails for all values at parameter 2^24 (except the all-zero set of values, which it fails earlier), but other tests don't start failing it in reasonable amounts of time until values are quite low in hamming weight, I think.

More relevantly to PractRang, the Japanese authors stress that one should not rely on testing suites such as TestU01 to establish the quality of new generators, because the suites are primarily made to reveal faults in already existing PNG. It is too easy to take a mediocre generator which fails TestU01, tweak it a bit and make it pass it, but in so doing deviations from randomness are probably just moved somewhere where the test suite is not looking, but not really eliminated. This is, I think what they are saying. What do you think?

Heh. Amusing that he classifies Diehard as a stringent statistical test even in current writings. To some extent I agree with the general contention - statistical tests on output may establish a bar that PRNGs must exceed, but with reasonable amounts of cycles it is not practical to raise that bar very high compared to the quality of PRNGs that can and have been designed. PRNGs with quality well beyond what output tests can quantify are made with a combination of output testing, state testing, good practices, and actual comprehension and analysis. For instance I like to engage in output testing on reduced quality versions of PRNGs, though analysis is required to relate their observed quality levels to that of the original PRNG. Unfortunately, most of those methods are too messy for common use, only statistical testing of output is straightforward enough to be feasible for typical users. On the other hand, his proposed alternatives seem pretty whack to me. Spectral Tests for LCGs and k-distrubtion tests for GFSRs may be mathematically interesting, and to do some things a tiny bit better than more casual methods, but notice he had to qualify each of them with a particular narrow family of PRNGs they applied to, and in each case a particularly bad family of PRNGs. The only reasons such tests can exist is because the PRNGs they study have awful structures.

Personally I see little reason for not using a strong, cryptographically secure RNG, such as AES in counter mode (with hardware support), ChaCha8, ISAAC, SPECK etc. These are quite fast, parallelizable don't use that much memory and have been scrutinised much, much more thoroughly than any ‘ordinary’ RNG, whose randomness is only skin-deep.
If in my applications the speed of random number generation is not the bottleneck, why not use a really good one? And if my application uses so very many random numbers that the generator is the bottleneck, why risk to spoil the results by using a generator of dubious quality? What do you think?

If you expect your application to be extremely sensitive to PRNG quality, there's usually nothing wrong with grabbing something of exceptionally high quality, and modern CSPRNGs certainly qualify. However, there is a broad range of quality below those where plenty of PRNGs exist that no one has ever come close to finding bias in their output, and many of those PRNGs are more appropriate for various cases due to various reasons (cache footprint, memory footprint, operations used, seeding speed, etc) while being equally good for normal applications.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

- 2019-10-06

Testing... yep, it's all perfectly linear, fails BRank rapidly. Still, it doesn't fail anything else. It wouldn't surprise me if this actually produced better output than mt19937. Testing with bad values... yeah, quality varies significantly with parameter values. Lowbit BRank fails for all values at parameter 2^24 (except the all-zero set of values, which it fails earlier), but other tests don't start failing it in reasonable amounts of time until values are quite low in hamming weight, I think.

Cancel that. I should have kept the test running. Here's how it goes:

tinyMT w/ supposedly good constants (0x8f7011ee, 0xfc78ff1f, 0x3793fdff, the first set of constants from the first big list of constants from the place linked on the tinyMT page) fails PractRand standard at 2^24 bytes, on a low-bit variant of the binary rank test using a 256x256 matrix. But what if you refuse to count that test, and all other binary rank test variants? Then it fails at 2^39 bytes, on a low-bit variant of the DC6 test.

tinyMT w/ bad constants (0x00000001, 0x00000001, 0x00000001) produces identical results. Well, the results on BRank (and low-bit variants) were exactly identical, the results on DC6 variants were slightly better than from the good constants, but still basically the same (1e-11 instead of 7e-16 at 2^39 bytes).

I was expecting better on non-BRank tests, from looking at the code. I guess I probably shouldn't have been that surprised, mt19937 is pretty bad for its size, and this is fairly similar but much smaller and somewhat neater.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lorenzo - 2019-10-08

Thanks a lot for your comments! So it seems that their paper is a mix of 'pot calling the kettle black' comments ('tinyMT is better than xorshift128+', 'for many parameter choices xorshift128+ is bad'), outdated claims ('Diehard is stringent test for randomness') and some reasonable statements ('we can't blindly rely on randomness suites to assess quality').

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lorenzo - 2020-02-11

BTW, just a few days after the discussion on this forum had started, Vigna posted the following article:
It is high time we let go of the Mersenne Twister
https://arxiv.org/abs/1910.06437

in which he criticises the Mersenne Twister, including the similarly-named tinyMT, while at the same time advertising his own xoroshiro128++ (etc.) generators. The article briefly discusses the thin line between testing and 'attacking' a PRNG. He says:
QUOTE
At this point, a good question is: How come that we are using in so many applications a generator
that fails statistical tests? There are two main reasons.
The first reason is that these tests were conceived specifically to find bias in linear generators,
using knowledge of their internal structure. In this sense they are different from classical tests,
such as the gap test [11], which look for generic undesired statistical regularities (or irregularities).
There is a thin line after which a statistical test becomes an attack, in the sense that it uses too
much knowledge about the generator. A good extreme example is that of a PRNG based on a AES
in counter mode [33]: we fix a secret, and AES gives us (say) a bijection on 128-bit blocks that is
difficult to reverse. We then apply the bijection to 128-bit integers k, k + 1, k + 2, . . . , and so on,
obtaining a strong random generator. Finding bias in such a generator would mean, essentially,
finding a weakness in the AES encryption scheme.
However, we can easily design a statistical test that this generator will fail: we simply apply
to 128-bit blocks of output the AES decoding function, and then apply a randomness test to the
resulting sequence. Since for our PRNG the resulting sequence is k, k + 1, k + 2 . . . for some k, it
will fail every statistical test.
Does this mean that AES in counter mode fails statistical tests? Really, no, because we used an
enormous amount of knowledge about the generator to design the test: basically, we are cheating — nobody can invert AES that easily. One might argue that the same is true of the binary-rank or of the linear-complexity tests, but, once again, we are walking a thin line.
UNQUOTE

which sounds sensible to me.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Cerian Knight - 2020-02-12
  
  Very sensible. However, the best practice is to remove as many potential criticisms as possible from the table as part of the design goal, which is what I have spent several years working on, scrapping one short-sighted idea after another.
  
  Ultimately, I have found that the xoroshiro engine is indeed the most promising... no need to re-invent the wheel, in my opinion. Even though some of the criticism of the existing scramblers is either unfounded or exaggetrated, there is still some room for improvement.
  
  Finally, I believe I have a set of scramblers for it that are better than the current set in terms of speed, code size, statistical randomness, equidistribution, invertability, sparse state recovery, floating point conversion synergy, and hardware implementation footprint.
  
  Speed is an interesting thing (that unfortunately is often considered more important than quality), as I have found every benchmark (even Vigna's, which is one of the best) flawed in the sense that they most often do not measure performance in real-world applications. I am using 5 tests: scalar assignment, scalar sum, array assignment, array sum, and random array element increment. The composite result produces a proxy for anticipating improvements in the real-world performance I see, for example, when running Vigna's Hamming Weight Dependency test.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Cerian Knight - 2020-07-26
    
    I am nearing the final milestone for release of an advanced scrambler for the xoroshiro engine:
    1. Complete 1D equidistribution (i.e. no missing zero from full period output).
    2. Faster in most use cases.
    3. Good statistical randomness (e.g. at 31 bits state, fails PractRand -tf 2 -te 1 at 128MB, and passes Small Crush, both with forward and reverse bits).
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Japanese paper on xorshift128+

statistical tests & psuedo- random number generators (RNGs, PRNGs)

Forums

Help

Japanese paper on xorshift128+

Japanese paper on xorshift128+

statistical tests & psuedo- random number generators (RNGs, PRNGs)

Forums

Help

Japanese paper on xorshift128+ document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Japanese paper on xorshift128+