Practically Random / Discussion / Open Discussion: sfc64

Warren D. Smith - 2018-08-20

1.Hi, there is an obvious-seeming criticism of (& response improving) your sfc64 routine for creating a 64-bit psu-random.

enum {BARREL_SHIFT = 24, RSHIFT = 11, LSHIFT = 3}; Uint64 tmp = a + b + counter++; a = b ^ (b >> RSHIFT); b = c + (c << LSHIFT); c = ((c << BARREL_SHIFT) | (c >> (64-BARREL_SHIFT))) + tmp; return tmp;

The "counter++" could be replaced by
"(counter+=UINT64_C(7640891575674082921))"
which presumably is the same speed but more randomness. Any odd increment could be used. You chose 1. The reason for delta=7640891575674082921 is that I once did a day-long search for good Weyl-generator increments. The continued fraction of delta/2^64 has unusually small sum of partial quotients 107 and unusually small maximum partial quotient 5. (Conceivably this would alter the best choice of BARREL, RSHIFT, LSHIFT. You didn't say how you chose them.)

I suspect that sfc64 is actually good enough for cryptography with security level about 2^128, PROVIDED user agrees to use only every Kth output word, for some large-enough K; probably K=1000 is safe.

--Warren D. Smith
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Warren D. Smith - 2018-08-20

Also, in

Uint64 tmp = a + b + counter++;
a = b ^ (b >> RSHIFT);
b = c + (c << LSHIFT);

it seems plausible the "a + b + counter"
would improve if replaced by "(a + counter) ^ a".
Again, same speed, but more presumed randomness because if usage of
+ and ^ are systematically alternated, then you get rid of associativity,
get more nonlinearity, etc.

My suggestions are not huge improvements, but seems obvious (to me anyhow) they are
improvements; and if you want to be empirical about it, it might be detectable to your tests if you made an artificially-shrunk mini-state space version of the routine.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Warren D. Smith - 2018-08-20

Agggh!

it seems plausible the "a + b + counter"
would improve if replaced by "(a + counter) ^ a".

I meant "(a + counter) ^ b".

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Warren D. Smith - 2018-08-20

hmm, actually if we alter the internal logic of sfc64 further to
maximally-use the "alternate + and xor" principle, we get (what probably is
the best yet; the state is {a,b,c,w}):

uint64 WDSrand64(){
uint64 tmp = (b + w) ^ a;
a = (b << LSHIFT) - b;
b = c ^ (c >> RSHIFT);
w += INCREM;
c = ((c << BARREL) | (c >> (64-BARREL))) + tmp;
return( tmp );
}

where INCREM = UINT64_C(7640891575674082921)
is suggested, and I have no idea what the best values for
LSHIFT, RSHIFT, and BARREL now should be.
INCREM must be odd. The others must be in {1,2,...,63}.
If you want I can investigate and come up
with some explicit numerical suggestions. Email me at warren.wds at gmail.com.
Another possibility is w could be a linear congruential rather
than Weyl generator, still with period 2^64. That would have more
feed-in randomness, but would be a bit slower.

The period necessarily is a multiple of 2^64, and probably
usually over 2^250 for random initial state. If use only every Kth iterate then probably
crypto-secure if K large enough (1000?).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

- 2018-08-20

counter++ could be any odd value at the same speed

More or less. On x86-64 the ++ can be done with INC, ADD(immediate), or LEA, whichever the compiler feels like (under some circumstances the LEA will be better). A 64 bit odd constant would require ADD(IP-relative) if I recall correctly. Which... wouldn't make much difference probably. On FPGAs or other exotic hardware I'm less confident on what the difference would be.

I suspect that sfc64 is actually good enough for cryptography with security level about 2^128, PROVIDED user agrees to use only every Kth output word, for some large-enough K; probably K=1000 is safe.

I'd expect K=100 to be adequate, but trying to prove that would be a nightmare and it would be pretty pointless anyway since skipping that many outputs makes it slower than other options.

lots of variations are better quality

...are they? Version 4 of SFC has been tested - hundreds if not thousands of CPU-hours on the algorithm as written, plus that much again on numerous variations of it (mostly reduced quality variations, since even sfc16 frequires 512 terabytes of output testing to show bias), plus hundreds of hours of human time on analysis. Any major changes that aren't self-evidently superior will need substantial testing.

Scaling the counter (making it a Weyl generator) would likely produce a tiny improvement in quality, which might or might not justify the extra complexity. Quick preliminary testing says: negligible difference in quality.

Changing "a + b + counter" to "(a + counter) ^ b"... I know I tested a few variations like that, can't remember about that specific one. Quick preliminary testing says: significant improvement? Huh. I'll try benchmarking it, and a few other tests, but that does seem worth thinking about.

Alternating addition and xor: On x86, the old left-shift was arranged that way to let it be done efficiently as an LEA, that construction doesn't support that. Also, I did test swapping the order of those two constructions like that, though not in conjunction that variation on the value of tmp. Quick preliminary testing says: small reduction in quality.

Last edit: 2018-08-22

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Warren D. Smith - 2018-08-22

Oh, the left shift (x<<L)+x, or (x<<L)-x, I do not care which you use; I thought
perhaps the minus verson might be a teeny bit more random, but it did not
occur to me the plus version has the advantage it can be compiled with an "LEA" instruction.
That seems a good enough reason to stick with plus.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

FWIW, regarding sfc64, with 0.93, when I ran an extended test, I got:

linux% ./sfc64 | ./RNG_test stdin64 -te 1 -tlmin 20 -tlmax 50 -
tlmaxonly -multithreaded                                                      
sfc64(0xdf7e5fd0802274af) initialized.
RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0xc53d1ca0
test set = expanded, folding = standard (64 bit)

rng=RNG_stdin64, seed=0xc53d1ca0
length= 1 megabyte (2^20 bytes), time= 0.3 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN_FF(2+1,13-7,T)               R= +12.5  p =  2.3e-4   unusual          
  ...and 217 test result(s) without anomalies

rng=RNG_stdin64, seed=0xc53d1ca0
length= 2 megabytes (2^21 bytes), time= 3.3 seconds
  no anomalies in 237 test result(s)

  ...

rng=RNG_stdin64, seed=0xc53d1ca0
length= 32 terabytes (2^45 bytes), time= 542262 seconds
  no anomalies in 763 test result(s)

rng=RNG_stdin64, seed=0xc53d1ca0
length= 64 terabytes (2^46 bytes), time= 1085079 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low4/64]Pat5(*,r)                R= +23.1  p~=   2e-7    very suspicious  
  [Low4/64]Pat5(*,g)                R= +4526  p~=  0          FAIL !!!!!!!!  
  [Low4/64]Pat5(0,r)                R=-105.7  p~= 1-1e-56     FAIL !!!!      
  [Low4/64]Pat5(0,g)                R=+784.0  p~=   6e-465    FAIL !!!!!!!   
  [Low4/64]Pat5(1,r)                R=-106.5  p~= 1-5e-57     FAIL !!!!      
  [Low4/64]Pat5(1,g)                R=+795.9  p~=   4e-472    FAIL !!!!!!!   
  [Low4/64]Pat5(2,r)                R=-103.0  p~= 1-6e-55     FAIL !!!!      
  [Low4/64]Pat5(2,g)                R=+781.0  p~=   3e-463    FAIL !!!!!!!   
  [Low4/64]Pat5(4,r)                R= +56.7  p~=   5e-27     FAIL !!        
  [Low4/64]Pat5(4,g)                R=+259.2  p~=   6e-149    FAIL !!!!!     
  [Low4/64]Pat5(5,r)                R= +57.6  p~=   1e-27     FAIL !!        
  [Low4/64]Pat5(5,g)                R=+256.9  p~=   1e-147    FAIL !!!!!     
  [Low4/64]Pat5(6,r)                R= +58.8  p~=   2e-28     FAIL !!        
  [Low4/64]Pat5(6,g)                R=+272.6  p~=   5e-157    FAIL !!!!!     
  [Low4/64]Pat5(7,r)                R= +56.0  p~=   1e-26     FAIL !!        
  [Low4/64]Pat5(7,g)                R=+253.2  p~=   2e-145    FAIL !!!!!     
  ...and 763 test result(s) without anomalies

rng=RNG_stdin64, seed=0xc53d1ca0
length= 128 terabytes (2^47 bytes), time= 2161013 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low4/64]Pat5(*,g)                R= +1316  p~=   6e-785    FAIL !!!!!!!   
  [Low4/64]Pat5(0,r)                R= -48.7  p~= 1-3e-22     FAIL !         
  [Low4/64]Pat5(0,g)                R=+147.4  p~=   1e-81     FAIL !!!!      
  [Low4/64]Pat5(1,r)                R= -47.6  p~= 1-1e-21     FAIL !         
  [Low4/64]Pat5(1,g)                R=+143.4  p~=   2e-79     FAIL !!!!      
  [Low4/64]Pat5(2,r)                R= -44.8  p~= 1-7e-20     FAIL !         
  [Low4/64]Pat5(2,g)                R=+134.6  p~=   6e-74     FAIL !!!!      
  [Low4/64]Pat5(4,r)                R= +38.9  p~=   2e-16     FAIL           
  [Low4/64]Pat5(4,g)                R=+100.2  p~=   3e-53     FAIL !!!!      
  [Low4/64]Pat5(5,r)                R= +38.8  p~=   2e-16     FAIL           
  [Low4/64]Pat5(5,g)                R= +96.5  p~=   5e-51     FAIL !!!       
  [Low4/64]Pat5(6,r)                R= +41.5  p~=   7e-18     FAIL           
  [Low4/64]Pat5(6,g)                R=+110.5  p~=   2e-59     FAIL !!!!      
  [Low4/64]Pat5(7,r)                R= +38.0  p~=   8e-16     FAIL           
  [Low4/64]Pat5(7,g)                R= +93.9  p~=   1e-49     FAIL !!!       
  ...and 779 test result(s) without anomalies

Note that although I used my own implementation of sfc64, it is byte-for-byte identical in output to the built-in sfc64 for the same seed, a randomly chosen 0xdf7e5fd0802274af in this case.

Prior to this point, I'd never seen anything fail the Pat5 tests and was mostly unaware of them. Interestingly sfc32 does not show this problem. It is currently fine out to 128 TB.

- 2018-08-23

Interesting. Thanks for reporting it. I'd practically forgotten about that test.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Gregory Petrosyan - 2022-07-20

FWIW, I've ran Pat5 test using PractRand pre0.95 on sfc64 10 times up to 128 TB, and did not observe any failures. I've also used the 0xdf7e5fd0802274af seed from your run, and it also passed up to 128 TB.

Test output (and the patch I've used to only run the Pat5 test) is here: https://gist.github.com/flyingmutant/32a30cb09d130e1cd754ca3577f1d8b2

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

- 2018-09-07

Changing "a + b + counter" to "(a + counter) ^ b" produced better avalanche scores in my testing, but worse output behavior on PractRand standard (from 512 TB down to 64 TB needed to show bias). Not going to make that change for the moment, though it may be worth investigating more later.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Michael Niedermayer - 2024-01-09

Hi all
Wasnt sure i should start a new thread but i guess as its about sfc64 this would reach people interrested. I was playing around first wondering how hard it would be to find the seeds from the output and then instead looking for patterns in the output based on my plan on how to find the seeds from the output.
The shortest code i found that shows statistical defects in sfc64 i got after maybe an hour today of looking into it is this:

uint64_t v1 = 0, v0 = 0, c = 0, a = 0; for (int i = 0; i<128; i++) { v0 = v1; v1 = sfc64_get(&sfc64); a = v1 - a; printf("%c%3"PRIX64, i%8 ? ' ' : '\n', a - 9*c >> 52); c = (c << 24 | c >> 40) + v0; }

Output of this is:

43F BE8 4E9 36A F81 2F4 275 8EA E1D C74 4EE 2D7 F84 2F2 279 978 EAE D02 45F 2D5 F86 2F0 279 8E5 EAF C74 3CD 2D7 F84 263 1E6 8EB EAC C75 3CC 2D5 F84 1D1 159 976 E21 C70 3D0 242 EF7 1CF 15B 8E5 D90 C6F 462 1B0 EF9 25B 15E 8E4 D92 C6F 462 241 EFA 25D 15C 974 D90 D00 3D0 241 EFA 25D CC A04 D92 CFE 3D1 241 EF7 25F CC 977 CFF D91 460 243 EF6 2F0 CC 975 D8F E1F 3D3 1B0 EFA 25D 15D 974 D8F E20 3CF 1B3 EF7 25E 15C 974 E21 D8E 3D3 1B1 EF9 2EE 15E 972 E23 D8D 3D3 240 EFA 2EB 15E 8E3 D94 D8B 3D6 23E EFC 37A 160 8E2

One can see the numbers are quite correlated within each column. I dont really think thats a problem as this test is really designed for sfc64 and not a general test for finding a statistical issue. And every (non secure and fast) PRNG likely has similar issues when one designs a test specifically for it. But it may be simply interresting in case this is not known already
As a check to ensure the code doesnt always do that. With another random source we get:

43F FD1 5EA 645 CDD B3 22C ACE EB5 8A9 667 685 C5E B33 4A8 3E2 8B3 95D 635 DB8 1B4 FEB FC2 F78 45B EFB 33 B6D B17 FFB 4F5 2A5 598 CFD F58 33E 7CF DFF 173 EEE 941 D38 B08 74 95A 9F1 D7C AE3 3F2 1F E1D 8AD 2CD 29E C1D CC5 E26 34B 96B 57A EA2 6FF 21C CCB 91D 78 1DD CD4 3CE ECC 62F 238 35C 3E1 A2C 440 746 360 8CC 6EF 802 37E BF3 956 8DB ED8 13C 528 651 215 F53 C13 981 A28 96B 669 767 B5D 807 B25 5A7 6B1 4BD BF 4BC F98 37D 772 182 5A3 51C F58 AF0 69F 321 F4F 79E AC0 18F 908 55E 92E 18D FAC 3E7 FFE BDA 668
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- - 2024-01-10
  
  Going from the output to the internal state looks quite feasible. Running the algorithm backwards should be even easier. The seed is simply the internal state 12 steps before output starts. There's probably some intelligent way to compute the state algebraically from the output, but if I had to do it, I'd just brute force it - guess a random value of 'c', use the output to calculate later values of 'c' from the guessed initial value, use the calculated values of c to calculate what the 3rd & 4th & 5th outputs should be, use the hamming distance between the predicted and observed outputs as a score, flip a random bit or two to my guessed initial value of 'c' and check if the score improved, revert if the score got worse, repeat until the score reaches zero. That should get you the full values of 'c' and 'counter' within a few billion cycles, at which point solving for the other two unknowns gets much much simpler.
  
  Your code looks like a distinguishing attack against sfc64. But since sfc64 isn't intended for cryptographic use, distinguishing attacks doesn't really mean much. If you could do that against efiix or salsa or somesuch that would be something, but sfc, xsm, jsf, even arbee should all be highly vulnerable to that kind of thing.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Michael Niedermayer - 2024-01-12
    
    Well, i dont "have to" do it, iam just playing with this for fun. Brute force is a bit boring. So heres my implementation of finding the seeds from 10+ output values and probably too terse description of how that works. But bascially it iterates over the 17-19 MSB of a compute the corresponding MSB of b and c from that (c requires us to explore multiple cases) and then move around so we split the now "known" part of c so the lower part becomes the MSB of c and then check that. That allows us to discard mismatching cases and ATM requires a few million steps to find the solution which is a fraction of a second for my testcase.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

IgneousRed - 4 days ago

On my Mac M2 Pro incrementing by random odd number (instead of 1) speeds up SFC by almost 10%.
Found this to be true also on my PRNGs.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- - 4 days ago
  
  Yeah, performance of high-end CPUs on short loops has been weird ever since... the early to mid 1990s I think? PPros. And these days it's not just high-end CPUs anymore.
  
  sfc is sort of optimized for x86. Not much really, but the value of the left-shift was chosen, at least in part, to work well with the LEA instruction on x86.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sfc64

statistical tests & psuedo- random number generators (RNGs, PRNGs)

Forums

Help

sfc64

sfc64

statistical tests & psuedo- random number generators (RNGs, PRNGs)

Forums

Help

sfc64 document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

sfc64