Activity for Practically Random

  • seahen seahen created ticket #22

    Feature request: "-tlshow-every" for progress reports on a linear schedule

  • Tommi H. Tommi H. posted a comment on discussion Open Discussion

    Compiled 0.96 on Windows 10 with MinGW-w64 13.0.0 and RNG_test.exe seemed to randomly fail with message "error reading from file". It turns out that because MinGW does not define WIN32, it uses _WIN32, RNG_test.cpp will not set stdin to binary mode. So, if -DWIN32 is passed to g++ it will work.

  • Sam Tansy Sam Tansy posted a comment on discussion Open Discussion

    I uploaded tarball for *nix, with GNUMakefile in `tools/'. Also GCC reports few errors, namely overflows, shift-count-overflows and shift-count-negatives. They deserve to be fixed. May it should use stdint.h, even if it required to put some simplified version of it in `include/' on windoze.

  • Sam Tansy Sam Tansy posted a comment on discussion Open Discussion

    I made GNU Makefile. It will help compile in *nix. It doesn't build RNG_to_TestU01, I didn't have inspiration but what is ready is already good.

  • orz orz posted a comment on discussion Open Discussion

    No, I haven't made a git repository or anything for it. An automated merger could be difficult, as there were a number of cases where files were split in two, renamed, or otherwise had pieces moved around.

  • Steve Steve posted a comment on discussion Open Discussion

    Is the source code version-controlled? A couple years ago I forked version pre0.95 and made many QoL changes, such as fixing warnings found from linting, and modernizing C++ code. I'd like to merge the 0.96 changes, and it would be convenient if git could use the source code history to help with the merge.

  • orz orz modified a comment on discussion Open Discussion

    Your code was very helpful, some adapted versions of it are present in xbg_helpers.cpp at the moment. It's working, or was a few weeks ago, I might have broken something while removing my LFSRs from the recommended PRNG set. I just ended up reviewing my LFSR-based algorithms at the end of everything and deciding I couldn't rate even the largest of them higher than 1 star on quality even with output hashing, largely due to excessive interstate correlations. I don't understand why things like xoroshiro...

  • orz orz posted a comment on discussion Open Discussion

    Your code was very helpful, some adapted versions of it are present in xbg_helpers.cpp at the moment. It's working, or was a few weeks ago, I might have broken something while removing my LFSRs from the recommended PRNG set. I just ended up reviewing my LFSR-based algorithms at the end of everything and deciding I couldn't rate even the largest of them higher than 1 star on quality even with output hashing, largely due to excessive interstate correlations. I don't understand why things like xoroshiro...

  • orz orz modified a comment on discussion Open Discussion

    Hopefully I didn't screw anything up, it's been a while since I've done a release. There are a couple new PRNGs there - very small, very fast, very light-weight ones, but requiring fast multiplication. mrsf64 & mrsf32 are so small you can keep their state in just 2 registers. mrc64 & mrc32 & mrc16 are a little bigger and a little slower, but slightly higher quality. Notably I did not include new random access PRNGs like I had intended to. I did incorporate code for random access to LFSRs / GFSRs...

  • G. Jones G. Jones posted a comment on discussion Open Discussion

    I've written about characteristic polynomials in the attachment.

  • Luke Miller Luke Miller modified a comment on discussion Open Discussion

    Hello PractRand team, I’m submitting a new stream generator for external evaluation. The generator is an ARX based design intended for high throughput keystream generation. I’ve run extensive local testing across PractRand, Gjrand, Dieharder, NIST STS, and TestU01, including repeated independent runs and long stream tests up to the terabyte scale, with no observed anomalies outside expected statistical behavior. A summary of results and reproduction commands are included. This work is part of a broader...

  • Luke Miller Luke Miller modified a comment on discussion Open Discussion

    Hello PractRand team, I’m submitting a new stream generator for external evaluation. The generator is an ARX based design intended for high throughput keystream generation. I’ve run extensive local testing across PractRand, Gjrand, Dieharder, NIST STS, and TestU01, including repeated independent runs and long stream tests up to the terabyte scale, with no observed anomalies outside expected statistical behavior. A summary of results and reproduction commands are included. This work is part of a broader...

  • Luke Miller Luke Miller posted a comment on discussion Open Discussion

    Hello PractRand team, I’m submitting a new stream generator for external evaluation. The generator is an ARX based design intended for high throughput keystream generation. I’ve run extensive local testing across PractRand, Gjrand, Dieharder, NIST STS, and TestU01, including repeated independent runs and long stream tests up to the terabyte scale, with no observed anomalies outside expected statistical behavior. A summary of results and reproduction commands are included. This work is part of a broader...

  • orz orz modified a comment on discussion Open Discussion

    Hopefully I didn't screw anything up, it's been a while since I've done a release. There are a couple new PRNGs there - very small, very fast, very light-weight ones, but requiring fast multiplication. mrsf64 & mrsf32 are so small you can keep their state in just 2 registers. mrc64 & mrc32 & mrc16 are a little bigger and a little slower, but slightly higher quality. Notably I did not include new random access PRNGs like I had intended to. I did incorporate code for random access to LFSRs / GFSRs...

  • orz orz posted a comment on discussion Open Discussion

    Hopefully I didn't screw anything up, it's been a while since I've done a release. There are a couple new PRNGs there - very small, very fast, very light-weight ones, but requiring fast multiplication. mrsf64 & mrsf32 are so small you can keep their state in just 2 registers. mrc64 & mrc32 & mrc16 are a little bigger and a little slower, but slightly higher quality. Notably I did not include new random access PRNGs like I had intended to. I did incorporate code for random access to LFSRs / GFSRs...

  • orz orz created a blog post

    v0.96 is up

  • Practically Random Practically Random released /PractRand_0.96.zip

  • orz orz modified ticket #20

    rarns64::raw64() bug

  • orz orz modified ticket #17

    Different results when 1 byte is skipped

  • orz orz modified ticket #16

    Big bug in Practrand 0.94 and 0.95 ?

  • orz orz posted a comment on discussion Open Discussion

    On further thought, I have that naming scheme I may rename them to randi32, randi32_fast, randi64, and, if I add it, randi64_fast. I think I avoided those kinds of names to maximize the difference relative to raw8/raw16/raw32/raw64, but at this point that doesn't matter anymore. To be clear, randli_fast / randi64_fast could be added, I'd just have to #ifdef different codepaths for different compilers, and in rare cases (mostly 32 bit compilers) randli_fast aka randi64_fast would end up super slow...

  • orz orz posted a comment on discussion Open Discussion

    I think I heard that most pc processors always calculate the upper mult, even when not asked... not sure for what % that is true, but it does seem to be quite fast on most CPUs. Sounds unlikely to me. All of this is limited by understanding, but this is how I understand the general picture. Integer wide multiplication almost always involves a different opcode than regular low multiplication. That makes it easy to save power by not calculating the upper half for low integer multiplication, which is...

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    I think I heard that most pc processors always calculate the upper mult, even when not asked... not sure for what % that is true, but it does seem to be quite fast on most CPUs. https://github.com/wangyi-fudan/wyhash The mix is prime number where each byte has 4 zeros and ones. As I mentioned, I tried to tinker with it mulitple times but I just end up destroying it's speed. https://tom-kaitchuck.medium.com/designing-a-new-prng-1c4ffd27124d And this was my intro to MWC, but the math is a bit above...

  • orz orz posted a comment on discussion Open Discussion

    Well, "slow" relative to the fastest decent simple PRNGs around. On my CPU. In the loops I measure performance in, which are different from those others use in benchmarks. In practice, anything over about 2.5 to 3.0 GB/s is difficult for me and I'm not quite sure why, past about twice that is absurdly difficult, and past 13 GB/s seems impossible - and multwistfast64 occasionally manages 10 GB/s. Though I am a bit surprised that wide multiplication is considered an appropriate operation for high speed...

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    I think you got how they work. They are still available on the link I posted. Wrote C long ago, here is LLM translation, it assumes __uint128_t #define WYR64_MIX 0x8bb84b93962eacc9ULL #define WYR64_INC 0x2d358dccaa6c78a5ULL #define ODD_PHI_64 0x9e3779b97f4a7c15ULL typedef uint64_t WYR64; WYR64 WYR64_init(uint64_t seed) { return (seed + WYR64_MIX) * WYR64_INC * ODD_PHI_64; } uint64_t WYR64_u64(WYR64 *r) { *r += WYR64_INC; __uint128_t mix = (__uint128_t)(*r) * (__uint128_t)(*r ^ WYR64_MIX); return...

  • orz orz posted a comment on discussion Open Discussion

    First, updates on my last post: The first two multiplication-based PRNGs I listed, I'm not thrilled with. They have incredibly bad avalanche properties. The output tests may be decent, but with avalanche that bad I just don't trust the output test results. I tried some variations and found one that was both faster and had better avalanche behavior... except then when I went back and benchmarked it again later it was way slower. I don't know if I somehow screwed up the earlier benchmarks, or if my...

  • IgneousRed IgneousRed modified a comment on discussion Open Discussion

    Sorry for the late reply! Interesting ones, hope you had fun. This is how fast they run on my M2 Pro (ns/out): Mul1: 0.7171630859375 Mul2: 0.8544921875 Mul3: 1.129150390625 Mul4: 1.129150390625 WYRand: 0.3509521484375 MWCP: 0.5645751953125 SFC: 0.885009765625 WYRand: r^ += WYR64_INC mix := u128(r^) * u128(r^ ~ WYR64_MIX) return u64(mix >> 64) ~ u64(mix) MWCP: mix := u128(r[1]) * u128(0xfeb344657c0af413) + u128(r[0]) r^ = {u64(mix >> 64), r[2], r[3], u64(mix)} return u64(mix >> 64) + u64(mix) Odin...

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    Sorry for the late reply! Interesting ones, hope you had fun. This is how fast they run on my M2 Pro (ns/out): ` Mul1: 0.7171630859375 Mul2: 0.8544921875 Mul3: 1.129150390625 Mul4: 1.129150390625 WYRand: 0.3509521484375 MWCP: 0.5645751953125 SFC: 0.885009765625 WYRand: r^ += WYR64_INC mix := u128(r^) * u128(r^ ~ WYR64_MIX) return u64(mix >> 64) ~ u64(mix) MWCP: mix := u128(r[1]) * u128(0xfeb344657c0af413) + u128(r[0]) r^ = {u64(mix >> 64), r[2], r[3], u64(mix)} return u64(mix >> 64) + u64(mix) `...

  • orz orz modified a comment on discussion Open Discussion

    I've been looking at multiplication-based PRNGs again lately. Some I've found interesting: Uint64 old = a + b; a = b * 0x9e3779b97f4a7c15ull; b = rotate64(old, 23); return a; That one is the fastest decent-ish PRNG I have in testing at the moment. Quality isn't particularly good - the 16 bit variant fails at a measly 8 megabytes, and the 32 bit variant fails at 64 GB. Ultimately, only the full 64 bit one seems like a quality I could possibly recommend, and even then I'd worry that I missed something...

  • orz orz posted a comment on discussion Open Discussion

    I've been looking at multiplication-based PRNGs again lately. Some I've found interesting: Uint64 old = a + b; a = b * 0x9e3779b97f4a7c15ull; b = rotate64(old, 23); return a; That one is the fastest decent-ish PRNG I have in testing at the moment. Quality isn't particularly good - the 16 bit variant fails at a measly 8 megabytes, and the 32 bit variant fails at 64 GB. Ultimately, only the full 64 bit one seems like a quality I could possibly recommend, and even then I'd worry that I missed something...

  • orz orz modified a comment on ticket #21

    Indeed seeding in 0.95 does use some uninitialized variables, though I would say the problem lies elsewhere, starting with line 252 (mask should have been initialized to 0, not 1), and continuing inside the loop where the code isn't fully adapted - it was supposed to start with an effective ITERATION_SIZE and INDIRECTION_SIZE of 1, use that to double the size of the initialized section of each array, then double the effective array sizes and repeat until both arrays are entirely populated, but I...

  • orz orz modified ticket #21

    efiix8x48::seed initialisation problem

  • orz orz posted a comment on ticket #21

    That does indeed seeding in 0.95 does use some uninitialized variables, though I would say the problem lies elsewhere, starting with line 252 (mask should have been initialized to 0, not 1), and continuing inside the loop where the code isn't fully adapted - it was supposed to start with an effective ITERATION_SIZE and INDIRECTION_SIZE of 1, use that to double the size of the initialized section of each array, then double the effective array sizes and repeat until both arrays are entirely populated,...

  • Lucas Rey Lucas Rey created ticket #21

    efiix8x48::seed initialisation problem

  • Lucas Rey Lucas Rey posted a comment on ticket #20

    found another potential issue, I'll just write it here since there is a re-write: in void PractRand::RNGs::Raw::efiix8x48::seed(Uint64 s1, Uint64 s2, Uint64 s3, Uint64 s4) I would suggest replacing: iteration_table[0] = 0; indirection_table[0] = 0; in lines 243 and 244 with: for (unsigned long y = 0; y < ITERATION_SIZE; y++) { iteration_table[y] = 0; } for (unsigned long y = 0; y < INDIRECTION_SIZE; y++) { indirection_table[y] = 0; } this is because in the algo afterwards all the values are NOT initialised,...

  • orz orz posted a comment on ticket #20

    New version of rarns algorithm looks like: enum { S1 = 3, S2 = 8, S3 = 31, OUTROT = 32 }; Uint64 rv = rotate64(xs1 + xs3, OUTROT); xs1 ^= xs2; xs2 ^= xs3; xs3 ^= xs1; xs2 = rotate64(xs2, S2); xs3 = rotate64(xs3, S3); xs1 ^= xs1 >> S1; return rv + xs3; Though the seeding functions and fast-forwarding/rewinding are still being worked on. Shifts are 2/11/26/16 at 32 bit, and 1/3/10/8 at 16 bit. Those are all maximum period xor-based-generators with outputs hashed well enough to pass tests.

  • orz orz posted a comment on ticket #20

    It is a bug. I'd say it will be fixed in the next release, but it's more accurate to say that all rarns code has been rewritten for the next release.

  • Lucas Rey Lucas Rey posted a comment on ticket #20

    *rans64::raw64

  • Lucas Rey Lucas Rey created ticket #20

    rarns64::raw64() bug

  • orz orz modified ticket #14

    Possibly confusing 'rng_test -help' tlmax typo

  • orz orz modified ticket #19

    isaac64x256::_advance_state bug

  • orz orz posted a comment on ticket #19

    You are correct, thank you. The issue will be addressed for version 0.96. In Bob's original code the xor happens in the macro for 32 bit ISAAC, and in the macro call for 64 bit ISAAC, and apparently I somehow ended up doing both in my implementation of the 64 bit variant.

  • Lucas Rey Lucas Rey created ticket #19

    isaac64x256::_advance_state bug

  • orz orz posted a comment on discussion Open Discussion

    I've had a little time and energy in the last few months to work on PractRand. One of the results: In trying to have a really broad range of PRNGs tested with PractRand and other test suites for comparison, I noticed that PR0.95 was badly underperforming on non-power-of-2 combined LCGs - they were failing TestU01 SmallCrush in a few seconds while passing multiple multiple days of PractRand testing with clean bills of health. This was due to the heavy use of birthday tests in TestU01. So I've gone...

  • orz orz posted a comment on discussion Open Discussion

    PractRand's RNG_test does broadly the same sort of stuff as TestU01's SmallCrush/Crush/BigCrush. The default parameters are generally about right for most uses, except that you usually want to limit the amount of testing done. The command line parameter "-tlmax 40" will tell it to stop after 1 terabyte, which ends up using about the same amount of CPU time as BigCrush, assuming the RNG is fast.

  • Mark Taylor Mark Taylor posted a comment on discussion Open Discussion

    Hi, I've tried and failed to get the TestU01 Bigcrush test to work on my RNG. Does PractRand have the equivalent or similar test? Thanks.

  • Lorenzo Lorenzo posted a comment on discussion Open Discussion

    Thank you very much for this detailed commentary, it is very much appreciated!

  • orz orz modified a comment on discussion Open Discussion

    I'm glancing through https://vigna.di.unimi.it/ftp/papers/LXM.pdf right now. I expect overall output quality to be decent. While the component generators are weak, the mixing function is probably capable of thoroughly hiding their weakness. The large statespace and effective seedspace should be sufficient for splitting. Because each sub-PRNG supports random access, the overall PRNG can be random-access as well. I'd expect speed to suffer from having to run both sub-PRNGs and the mixing function,...

  • orz orz modified a comment on discussion Open Discussion

    I'm glancing through https://vigna.di.unimi.it/ftp/papers/LXM.pdf right now. I expect overall output quality to be decent. While the component generators are weak, the mixing function is probably capable of thoroughly hiding their weakness. The large statespace and effective seedspace should be sufficient for splitting. Because each sub-PRNG supports random access, the overall PRNG can be random-access as well. I'd expect speed to suffer from having to run both sub-PRNGs and the mixing function,...

  • orz orz posted a comment on discussion Open Discussion

    I'm glancing through https://vigna.di.unimi.it/ftp/papers/LXM.pdf right now. I expect overall output quality to be decent. While the component generators are weak, the mixing function is probably capable of thoroughly hiding their weakness. The large statespace and effective seedspace should be sufficient for splitting. Because each sub-PRNG supports random access, the overall PRNG can be random-access as well. I'd expect speed to suffer from having to run both sub-PRNGs and the mixing function,...

  • Lorenzo Lorenzo posted a comment on discussion Open Discussion

    I don't know if this is of interest to anyone, but the Java programming language includes since version 17 (which came out in 2021) RNGs form the LXM family by Vigna. See https://dl.acm.org/doi/10.1145/3485525 and https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/random/package-summary.html Anybody has any opinion of these generators? Are they good? Here are some benchmarks (for generating 64-bit floating point numbers) which include all RNGs included by standard in Java: Intel...

  • orz orz posted a comment on discussion Open Discussion

    Mine is a left rotate too. The implementation code is listed (though the compiler hopefully turns that in to a single opcode). My thinking was that if you skipped seeding, zeroed state, and then looked at it through the first few calls, it should be apparent exactly which part of our algorithms differ. And yeah, the problem with using LEA like that is that it performs poorly on anything other than x86. I'm currently considering this variation: Uint64 tmp = a + b; a = rotate(a, 25); a ^= old; a +=...

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    I got it... I said first 5 outputs after the init, while you are showing the state after init... this is good then: d8 1c 84 05 51 As for why we are getting different tests results is eather that your new version for mod3n is different or that we feed the test differently.... So, when is the new version coming out? Haha

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    Ah, you are using left rotate... then I am really confused, I will re-check. Sorry for taking you time like this haha.

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    Funny we posted at the same time... Maybe the rotates are different, I am using left rotate...

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    const Uint64 K = 0x92ec64765925a395ull; a ^= b; a = rotate(a, 13); b = b + (b << 3) + Uint32(K); a *= K; return a; This is more than twice slower than MWC on my Arm Mac. But it is an interesting approach, thanks for sharing! I see you are also not a fan of LFSRs haha. Uint16 state[5]; static Uint16 _bswap16(Uint16 v) { return (Uint16((v >> 0) & 15) << 12) | (Uint16((v >> 4) & 15) << 8) | (Uint16((v >> 8) & 15) << 4) | (Uint16((v >> 12) & 15) << 0); } Uint16 raw16() { Uint16 tmp = state[0] - state[1];...

  • orz orz posted a comment on discussion Open Discussion

    I wonder if your implementation is identical... Here are hex values for first 5 outputs: Is that for seed zero? That's not what I'm getting. Here's the 8-bit version's code I'm using: Uint8 tmp = state[0] - state[1]; state[0] = state[1] + state[4]; state[1] = _bswap8(state[2]); state[2] = state[3] ^ tmp; state[3] = rotate8(tmp, 5); state[4] += (0x9e3779b97f4a7c15ull >> (64 - 8 * sizeof(tmp))) | 1; return state[0]; and the functions used: static Uint8 rotate8(Uint8 v, int bits) { return (v << bits)...

  • orz orz modified a comment on discussion Open Discussion

    32 bit version passed 32 TB, I stopped the test there. I haven't tried comparing our output to figure out why I'm getting mod3n failures with -++ while you're not. My implementation looks roughly like this: Uint16 state[5]; static Uint16 _bswap16(Uint16 v) { return (Uint16((v >> 0) & 15) << 12) | (Uint16((v >> 4) & 15) << 8) | (Uint16((v >> 8) & 15) << 4) | (Uint16((v >> 12) & 15) << 0); } Uint16 raw16() { Uint16 tmp = state[0] - state[1]; state[0] = state[1] + state[4]; state[1] = _bswap16(state[2]);...

  • orz orz posted a comment on discussion Open Discussion

    32 bit version passed 32 TB, I stopped the test there. I haven't tried comparing our output to figure out why I'm getting mod3n failures with -++ while you're not. My implementation looks roughly like this: Uint16 state[5]; static Uint16 _bswap16(Uint16 v) { return (Uint16((v >> 0) & 15) << 12) | (Uint16((v >> 4) & 15) << 8) | (Uint16((v >> 8) & 15) << 4) | (Uint16((v >> 12) & 15) << 0); } Uint16 raw16() { Uint16 tmp = state[0] - state[1]; state[0] = state[1] + state[4]; state[1] = _bswap16(state[2]);...

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    I forgot to emphasise to use the init from https://github.com/IgneousRed/PrngBoard/blob/main/FSR/FSR.odin FSR64_init :: proc(seed: u64) -> FSR64 { rng := FSR64{seed, seed, seed, seed, seed} for _ in 0 ..< 15 do FSR64_u64(&rng) return rng }

  • IgneousRed IgneousRed modified a comment on discussion Open Discussion

    4-part byteswap on 16 bit variant, huh? Okay, I'm switching to that, named it nibbleswap. Still fails mod3n at 64MB if I don't put in an xor though, and the 16 bit version still fails around the 512 GB to 1 TB range. For u8 and u16 I used a custom instruction for swap, as u8 swap does nothing and for u16 it is same as rot8. I wonder if your implementation is identical... Here are hex values for first 5 outputs: u8 3a 97 c8 0a d4 u16 4ddf d1f6 4b87 9ebc d254 u32 f6595109 0b321081 a9cce385 6cb7aa32...

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    4-part byteswap on 16 bit variant, huh? Okay, I'm switching to that, named it nibbleswap. Still fails mod3n at 64MB if I don't put in an xor though, and the 16 bit version still fails around the 512 GB to 1 TB range. For u8 and u16 I used a custom instruction for swap, as u8 swap does nothing and for 16 it is same as rot4. I wonder if your implementation is identical... Here are hex values for first 5 outputs: u8 3a 97 c8 0a d4 u16 4ddf d1f6 4b87 9ebc d254 u32 f6595109 0b321081 a9cce385 6cb7aa32...

  • IgneousRed IgneousRed modified a comment on discussion Open Discussion

    I made a chaotic counting PRNG with 5 states, 6 instructions and uses byteswap instruction. It may be just me... But it is soo beautiful :D FSR64_u64 :: proc(r: ^FSR64) -> u64 { s := r[0] - r[1] r[0] = r[1] + r[4] r[1] = bits.byte_swap(r[2]) r[2] = r[3] + s r[3] = bits.rotate_left64(s, 15) r[4] += lib.ODD_PHI_64 return r[0] } Tests and more information: https://github.com/IgneousRed/PrngBoard/tree/main/FSR This is not the final version. Not sure what rotate would be best, and not sure about add/sub/xor...

  • orz orz posted a comment on discussion Open Discussion

    16 bit variants with xor fail at either 512 GB or 1 TB. 32 bit variants have passed 8 TB so far. I target mid-high end CPUs and byteswap exists on both x86 and Arm. It does indeed. But just because the instruction exists in the ISA doesn't mean that the engineers prioritized its performance, that varies from one design to the next. I generally think of it as safer to stick to the most popular opcodes, since I've seen a number of cases of less popular opcodes performing poorly on specific implementations,...

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    Also for 8bit and 16bit I used a byteswap emulation with 4 parts (as if those words are 32bit)

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    I target mid-high end CPUs and byteswap exists on both x86 and Arm. Having 2 different bit moves that preserve all bits is very handy. I was reluctant to use xor as it makes DC7-9 indep worse, tho I am open to the idea. There are test results in the folder I linked. I did mention that +++ fails mod3n, are you using a different increment? ODD_PHI_64 :: 0x9e3779b97f4a7c15 ODD_PHI_32 :: 0x9e3779b9 ODD_PHI_16 :: 0x9e37 ODD_PHI_8 :: 0x9f You did not answer my question about going from 4 to 5 states. It...

  • orz orz modified a comment on discussion Open Discussion

    Byte order reversal is indeed very nice for such purposes, probably slightly better than bit rotation. There are reasons why bit rotation is preferred anyway though. Bit rotation is much better supported by high-level languages. Partially because of that, CPU designers are more inclined to include optimized hardware implementations. On ISAs that don't support bit-rotation, it's easy to emulate efficiently (just an immediate left-shift, an immediate right-shift, and a bitwise or are sufficient to...

  • orz orz modified a comment on discussion Open Discussion

    Byte order reversal is indeed very nice for such purposes, probably slightly better than bit rotation. There are reasons why bit rotation is preferred anyway though. Bit rotation is much better supported by high-level languages. Partially because of that, CPU designers are more inclined to include optimized hardware implementations. On ISAs that don't support bit-rotation, it easy to emulate (just an immediate left-shift, an immediate right-shift, and a bitwise or are sufficient to replace an immediate...

  • orz orz modified a comment on discussion Open Discussion

    Byte order reversal is indeed very nice for such purposes, probably slightly better than bit rotation. There are reasons why bit rotation is preferred anyway though. Bit rotation is much better supported by high-level languages. Because of that, CPU designers are more inclined to include optimized hardware implementations. On ISAs that don't support bit-rotation, it easy to emulate (just an immediate left-shift, an immediate right-shift, and a bitwise or are sufficient to replace an immediate rotation),...

  • orz orz posted a comment on discussion Open Discussion

    Byte order reversal is indeed very nice for such purposes, probably slightly better than bit rotation. There are reasons why bit rotation is preferred anyway though. Bit rotation is much better supported by high-level languages. Because of that, CPU designers are more inclined to include optimized hardware implementations. On ISAs that don't support bit-rotation, it easy to emulate (just an immediate left-shift, an immediate right-shift, and a bitwise or are sufficient), while byte order reversal...

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    I made a chaotic counting PRNG with 5 states, 6 instructions and uses byteswap instruction. It may be just me... But it is soo beautiful :D FSR64_u64 :: proc(r: ^FSR64) -> u64 { s := r[0] - r[1] r[0] = r[1] + r[4] r[1] = bits.byte_swap(r[2]) r[2] = r[3] + s r[3] = bits.rotate_left64(s, 15) r[4] += lib.ODD_PHI_64 return r[0] } Tests and more information: https://github.com/IgneousRed/PrngBoard/tree/main/FSR This is not the final version. Not sure what rotate would be best, and not sure about +/-/~...

  • IgneousRed IgneousRed modified a comment on discussion Open Discussion

    If a 4 word PRNG shows the (Extended+MaxFold) failures as: 8bit word => 2^30 16bit word => 2^36 32bit word => 2^42 64bit word => 2^45+ Is what I wrote. Is this ModernSoftware™ moment? :D Maybe I am just dumb

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    If a 4 word PRNG shows the (Extended+MaxFold) failures as: 8bit word => 2^30 16bit word => 2^36 32bit word => 2^42 64bit word => 2^45+ Is what I wrote. Is this ModernSoftware™ moment? :D

  • orz orz posted a comment on discussion Open Discussion

    Multiplication is... decent? I mean, it's fast, on a lot of CPUs, and tends to mix things better than add/xor. The class of CPUs it's fast on smaller than the class of CPUs that the basics (add, xor, shift) are fast on, so a PRNG that uses multiplication is inherently targeting a smaller niche. Not much smaller these days, since fast multiplication is pretty common. And low multiplication alone is never enough, you inherently need shifts, and probably add or xor too, or maybe just decimated output...

  • orz orz posted a comment on discussion Open Discussion

    What? If a 4 word PRNG shows the (Extended+MaxFold) failures as: 16bit word => 2^36 16 bits is 2 bytes. 4 times 2 bytes is 8 bytes of state. Is that not saying that failures occur after 2^36 bytes with a state size of 8 bytes?

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    From what I understand, your stance is that multiplication should be avoided since it is too slow? I would totally agree with that stance 10years ago. But for last several years every CPU can do full multiply with decent speed. Given that it does alot more mixing than an add, I think it more than worth it's weight. When choosing a PRNG for low-end CPUs, it should be avoided. But why not list such RNGs for people to have a choice, and read it's Pros and Cons? MWC is a classic at this point, but adding...

  • IgneousRed IgneousRed modified a comment on discussion Open Discussion

    Just now caught an error... And the fact that it fails at 2^36 bytes for an 8 byte state and 2^42 bytes for a 16 byte state would worry me. That is not what I wrote, also 8bit version has 32bit state, and failure after 2^30 is about as much as would be expected. The problem is that it scales bad.

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    Just now caught an error... And the fact that it fails at 2^36 bytes for an 8 byte state and 2^42 bytes for a 16 byte state would worry me. That is not what I wrote, also 8bit version has 32bit state, and failure after 2^30 is about as much as would be expected. The problem is that it scales bad.

  • IgneousRed IgneousRed modified a comment on discussion Open Discussion

    The RNG I am currently making is indeed a CountingChaotic type and is almost twice as fast on AArch64 than SFC, without using multiplication and should be easy to make it SIMD . I will create another topic when my long tests confirm some things. Also I want to say thank you for creating this amazing software! Cant wait for the next version (been quite a while), hopefully the 0 generator won't make a fatal error, and for folded FPF test not failing randomly with data below 2^20 :D

  • IgneousRed IgneousRed modified a comment on discussion Open Discussion

    The RNG I am currently making is indeed a CountingChaotic type and is almost twice as fast on AArch64 than SFC, without using multiplication and should be easyto make it SIMD . I will create another topic when my long tests confirm some things. Also I want to say thank you for creating this amazing software! Cant wait for the next version (been quite a while), hopefully the 0 generator won't make a fatal error, and for folded FPF test not failing randomly with data below 2^20 :D

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    The RNG I am currently making is indeed a CountingChaotic type and is almost twice as fast on AArch64 than SFC, without using multiplication and should be easily to make it SIMD . I will create another topic when my long tests confirm some things. Also I want to say thank you for creating this amazing software! Cant wait for the next version (been quite a while), hopefully the 0 generator won't make a fatal error, and for folded FPF test not failing randomly with data below 2^20 :D

  • orz orz posted a comment on discussion Open Discussion

    Yeah, performance of high-end CPUs on short loops has been weird ever since... the early to mid 1990s I think? PPros. And these days it's not just high-end CPUs anymore. sfc is sort of optimized for x86. Not much really, but the value of the left-shift was chosen, at least in part, to work well with the LEA instruction on x86.

  • orz orz posted a comment on discussion Open Discussion

    Probably not. I mean, if there are other PRNGs that seem comparable in most ways but are better quality, I'd tend to recommend the better quality ones for whatever niche they were competing in. And the fact that it fails at 2^36 bytes for an 8 byte state and 2^42 bytes for a 16 byte state would worry me. That implies that even if the 64-bit version passed very large tests, it's might be because the tests couldn't keep track of correlations at the right distances or whatever - and there would be a...

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    On my Mac M2 Pro incrementing by random odd number (instead of 1) speeds up SFC by almost 10%. Found this to be true also on my PRNGs.

  • IgneousRed IgneousRed posted a comment on discussion Open Discussion

    If a 4 word PRNG shows the (Extended+MaxFold) failures as: 8bit word => 2^30 16bit word => 2^36 32bit word => 2^42 64bit word => 2^45+ It is not state efficient, but is that fine if it is fast? You surely won't name it "future-proof", but would you recommend 64bit version? And why/not? Also from https://pracrand.sourceforge.net/ I get the sense that you hold 8bit PRNGs to the same standard of randomness as 64bit ones, why is that? From my perspective If one uses 8bit on a micro-controller they probably...

  • orz orz posted a comment on discussion Open Discussion

    PractRand targets normal integer sizes - 8, 16, 32, 64, and unknown. For a 24 bit source I would recommend either 8 (stdin8 / As8) or unknown (stdin / AsUnknown). Or better running tests twice, once on the lowest 16 bits and then again the highest 16 bits. Eventually the command line will get powerful enough to allow you to specify that, but it's not there yet, so currently you'll have use your own code to drop 8 bits out of every 24.

  • palxex palxex posted a comment on discussion Open Discussion

    Or could you please add stdin24?

  • Michael Niedermayer Michael Niedermayer posted a comment on discussion Open Discussion

    Well, i dont "have to" do it, iam just playing with this for fun. Brute force is a bit boring. So heres my implementation of finding the seeds from 10+ output values and probably too terse description of how that works. But bascially it iterates over the 17-19 MSB of a compute the corresponding MSB of b and c from that (c requires us to explore multiple cases) and then move around so we split the now "known" part of c so the lower part becomes the MSB of c and then check that. That allows us to discard...

  • orz orz posted a comment on discussion Open Discussion

    Going from the output to the internal state looks quite feasible. Running the algorithm backwards should be even easier. The seed is simply the internal state 12 steps before output starts. There's probably some intelligent way to compute the state algebraically from the output, but if I had to do it, I'd just brute force it - guess a random value of 'c', use the output to calculate later values of 'c' from the guessed initial value, use the calculated values of c to calculate what the 3rd & 4th...

  • Michael Niedermayer Michael Niedermayer posted a comment on discussion Open Discussion

    Hi all Wasnt sure i should start a new thread but i guess as its about sfc64 this would reach people interrested. I was playing around first wondering how hard it would be to find the seeds from the output and then instead looking for patterns in the output based on my plan on how to find the seeds from the output. The shortest code i found that shows statistical defects in sfc64 i got after maybe an hour today of looking into it is this: uint64_t v1 = 0, v0 = 0, c = 0, a = 0; for (int i = 0; i<128;...

  • IgneousRed IgneousRed created ticket #18

    Expanded test error "gap product is zero"

  • orz orz posted a comment on ticket #17

    I tried some tests on the code you posted. TestU01: 1/1/1 That is, it fails 1 subtest (out of 15) of SmallCrush, 1 (out of 144) of Crush, and 1 (out of 160) of BigCrush. So about 10-15 seconds for first failure. In every case, it was a birthday spacings test that found bias. Also, some people like to run TestU01 on bit-backwards versions of PRNGs (because TestU01 is less stringent on the higher bits, completely skipping the highest IIRC) I also tried both bit-backwards and byte-backwards ; bit-backwards...

  • Arlet Ottens Arlet Ottens posted a comment on ticket #17

    I've seen some failures reported on other tests (I remember MaxOft) as well, but with p-values around 1e-4, so not really definitive. The birthday spacings is the one that fails hard. I'll keep an eye on other results. In my code, the 48 bit LCG portion is fixed, and the 32 bit hash sequence is generated by an algorithm that tries out random sequences of candidate instructions . Each generator is then tested by calling PractRand, which I have modified to return a simple quality score. The set of...

  • orz orz modified a comment on ticket #17

    No, wait, I'm confused. I hadn't touched PractRand in a while and was accidentally testing with version 0.92. Testing with version 0.95, it does adequately on those files at default settings. rng=RNG_file(d:\inc\out), seed=0 length= 1 gigabyte (2^30 bytes), time= 24.0 seconds Test Name Raw Processed Evaluation mod3n(5):(0,9-0) R= +74.0 p = 1.9e-40 FAIL !!! ...and 222 test result(s) without anomalies rng=RNG_file(d:\inc\out1), seed=0 length= 32 megabytes (2^25 bytes), time= 4.6 seconds Test Name Raw...

  • orz orz modified a comment on ticket #17

    No, wait, I'm confused. I hadn't touched PractRand in a while and was accidentally testing with version 0.92. Testing with version 0.95, it does adequately on those files at default settings. rng=RNG_file(d:\inc\out), seed=0 length= 1 gigabyte (2^30 bytes), time= 24.0 seconds Test Name Raw Processed Evaluation mod3n(5):(0,9-0) R= +74.0 p = 1.9e-40 FAIL !!! ...and 222 test result(s) without anomalies rng=RNG_file(d:\inc\out1), seed=0 length= 32 megabytes (2^25 bytes), time= 4.6 seconds Test Name Raw...

  • orz orz posted a comment on ticket #17

    No, wait, I'm confused. I hadn't touched PractRand in a while and was accidentally testing with version 0.92. Testing with version 0.95, it does adequately on those files at default settings. rng=RNG_file(d:\inc\out), seed=0 length= 1 gigabyte (2^30 bytes), time= 24.0 seconds Test Name Raw Processed Evaluation mod3n(5):(0,9-0) R= +74.0 p = 1.9e-40 FAIL !!! ...and 222 test result(s) without anomalies rng=RNG_file(d:\inc\out1), seed=0 length= 32 megabytes (2^25 bytes), time= 4.6 seconds Test Name Raw...

  • orz orz posted a comment on ticket #17

    It is rather embarrassing that SmallCrush is detecting flaws that PractRand needs hours or more on. Perhaps I should have a small variant of the birthday spacings test incorporated in to the base test set... I presume it's all birthday spacings tests it's failing? Let's see... for this variation at least, the PRNG state transition function looks like an 48 bit LCG with additive constant 69, multiplicative constant 0x010101010101, and modulus 0x1000000000000, plus an 8-bit accumulator used in a rather...

  • Arlet Ottens Arlet Ottens posted a comment on ticket #17

    Possibly related: I noticed this issue while working on a project to create a good PRNG for 8 bit targets, using a genetic/hill climbing algorithm that uses PractRand to test candidates. I've seen it happen quite regularly that PractRand gives a pass for 128GB or even 1TB, but that the same PRNG output fails in SmallCrush/Crush, usually on the BirthdaySpacings test. The same thing happened with the output files above. The contrast between PractRand running for an hour or more and seeing no problems...

  • orz orz posted a comment on ticket #17

    Quirky, but it's looking like correct behavior to me so far. I think the third nibble out of each 64 bits has some kind of detectable pattern, and removing the first byte phase shifts that to the first nibble out of each 64 bits, which is much easier for PractRand to notice.

1 >
MongoDB Logo MongoDB