Menu

sfc64

2018-08-20
4 days ago
  • Warren D. Smith

    Warren D. Smith - 2018-08-20

    1.Hi, there is an obvious-seeming criticism of (& response improving) your sfc64 routine for creating a 64-bit psu-random.

    enum {BARREL_SHIFT = 24, RSHIFT = 11, LSHIFT = 3};
    Uint64 tmp = a + b + counter++;
    a = b ^ (b >> RSHIFT);
    b = c + (c << LSHIFT);
    c = ((c << BARREL_SHIFT) | (c >> (64-BARREL_SHIFT))) + tmp;
    return tmp;
    

    The "counter++" could be replaced by
    "(counter+=UINT64_C(7640891575674082921))"
    which presumably is the same speed but more randomness. Any odd increment could be used. You chose 1. The reason for delta=7640891575674082921 is that I once did a day-long search for good Weyl-generator increments. The continued fraction of delta/2^64 has unusually small sum of partial quotients 107 and unusually small maximum partial quotient 5. (Conceivably this would alter the best choice of BARREL, RSHIFT, LSHIFT. You didn't say how you chose them.)

    1. I suspect that sfc64 is actually good enough for cryptography with security level about 2^128, PROVIDED user agrees to use only every Kth output word, for some large-enough K; probably K=1000 is safe.

    --Warren D. Smith

     
  • Warren D. Smith

    Warren D. Smith - 2018-08-20

    Also, in

    Uint64 tmp = a + b + counter++;
    a = b ^ (b >> RSHIFT);
    b = c + (c << LSHIFT);

    it seems plausible the "a + b + counter"
    would improve if replaced by "(a + counter) ^ a".
    Again, same speed, but more presumed randomness because if usage of
    + and ^ are systematically alternated, then you get rid of associativity,
    get more nonlinearity, etc.

    My suggestions are not huge improvements, but seems obvious (to me anyhow) they are
    improvements; and if you want to be empirical about it, it might be detectable to your tests if you made an artificially-shrunk mini-state space version of the routine.

     
  • Warren D. Smith

    Warren D. Smith - 2018-08-20

    Agggh!

    it seems plausible the "a + b + counter"
    would improve if replaced by "(a + counter) ^ a".

    I meant "(a + counter) ^ b".

     
  • Warren D. Smith

    Warren D. Smith - 2018-08-20

    hmm, actually if we alter the internal logic of sfc64 further to
    maximally-use the "alternate + and xor" principle, we get (what probably is
    the best yet; the state is {a,b,c,w}):

    uint64 WDSrand64(){
    uint64 tmp = (b + w) ^ a;
    a = (b << LSHIFT) - b;
    b = c ^ (c >> RSHIFT);
    w += INCREM;
    c = ((c << BARREL) | (c >> (64-BARREL))) + tmp;
    return( tmp );
    }

    where INCREM = UINT64_C(7640891575674082921)
    is suggested, and I have no idea what the best values for
    LSHIFT, RSHIFT, and BARREL now should be.
    INCREM must be odd. The others must be in {1,2,...,63}.
    If you want I can investigate and come up
    with some explicit numerical suggestions. Email me at warren.wds at gmail.com.
    Another possibility is w could be a linear congruential rather
    than Weyl generator, still with period 2^64. That would have more
    feed-in randomness, but would be a bit slower.

    The period necessarily is a multiple of 2^64, and probably
    usually over 2^250 for random initial state. If use only every Kth iterate then probably
    crypto-secure if K large enough (1000?).

     
  • - 2018-08-20

    counter++ could be any odd value at the same speed

    More or less. On x86-64 the ++ can be done with INC, ADD(immediate), or LEA, whichever the compiler feels like (under some circumstances the LEA will be better). A 64 bit odd constant would require ADD(IP-relative) if I recall correctly. Which... wouldn't make much difference probably. On FPGAs or other exotic hardware I'm less confident on what the difference would be.

    I suspect that sfc64 is actually good enough for cryptography with security level about 2^128, PROVIDED user agrees to use only every Kth output word, for some large-enough K; probably K=1000 is safe.

    I'd expect K=100 to be adequate, but trying to prove that would be a nightmare and it would be pretty pointless anyway since skipping that many outputs makes it slower than other options.

    lots of variations are better quality

    ...are they? Version 4 of SFC has been tested - hundreds if not thousands of CPU-hours on the algorithm as written, plus that much again on numerous variations of it (mostly reduced quality variations, since even sfc16 frequires 512 terabytes of output testing to show bias), plus hundreds of hours of human time on analysis. Any major changes that aren't self-evidently superior will need substantial testing.

    Scaling the counter (making it a Weyl generator) would likely produce a tiny improvement in quality, which might or might not justify the extra complexity. Quick preliminary testing says: negligible difference in quality.

    Changing "a + b + counter" to "(a + counter) ^ b"... I know I tested a few variations like that, can't remember about that specific one. Quick preliminary testing says: significant improvement? Huh. I'll try benchmarking it, and a few other tests, but that does seem worth thinking about.

    Alternating addition and xor: On x86, the old left-shift was arranged that way to let it be done efficiently as an LEA, that construction doesn't support that. Also, I did test swapping the order of those two constructions like that, though not in conjunction that variation on the value of tmp. Quick preliminary testing says: small reduction in quality.

     

    Last edit: 2018-08-22
  • Warren D. Smith

    Warren D. Smith - 2018-08-22

    Oh, the left shift (x<<L)+x, or (x<<L)-x, I do not care which you use; I thought
    perhaps the minus verson might be a teeny bit more random, but it did not
    occur to me the plus version has the advantage it can be compiled with an "LEA" instruction.
    That seems a good enough reason to stick with plus.

     
  • imneme

    imneme - 2018-08-23

    FWIW, regarding sfc64, with 0.93, when I ran an extended test, I got:

    linux% ./sfc64 | ./RNG_test stdin64 -te 1 -tlmin 20 -tlmax 50 -
    tlmaxonly -multithreaded                                                      
    sfc64(0xdf7e5fd0802274af) initialized.
    RNG_test using PractRand version 0.93
    RNG = RNG_stdin64, seed = 0xc53d1ca0
    test set = expanded, folding = standard (64 bit)
    
    rng=RNG_stdin64, seed=0xc53d1ca0
    length= 1 megabyte (2^20 bytes), time= 0.3 seconds
      Test Name                         Raw       Processed     Evaluation
      BCFN_FF(2+1,13-7,T)               R= +12.5  p =  2.3e-4   unusual          
      ...and 217 test result(s) without anomalies
    
    rng=RNG_stdin64, seed=0xc53d1ca0
    length= 2 megabytes (2^21 bytes), time= 3.3 seconds
      no anomalies in 237 test result(s)
    
      ...
    
    rng=RNG_stdin64, seed=0xc53d1ca0
    length= 32 terabytes (2^45 bytes), time= 542262 seconds
      no anomalies in 763 test result(s)
    
    rng=RNG_stdin64, seed=0xc53d1ca0
    length= 64 terabytes (2^46 bytes), time= 1085079 seconds
      Test Name                         Raw       Processed     Evaluation
      [Low4/64]Pat5(*,r)                R= +23.1  p~=   2e-7    very suspicious  
      [Low4/64]Pat5(*,g)                R= +4526  p~=  0          FAIL !!!!!!!!  
      [Low4/64]Pat5(0,r)                R=-105.7  p~= 1-1e-56     FAIL !!!!      
      [Low4/64]Pat5(0,g)                R=+784.0  p~=   6e-465    FAIL !!!!!!!   
      [Low4/64]Pat5(1,r)                R=-106.5  p~= 1-5e-57     FAIL !!!!      
      [Low4/64]Pat5(1,g)                R=+795.9  p~=   4e-472    FAIL !!!!!!!   
      [Low4/64]Pat5(2,r)                R=-103.0  p~= 1-6e-55     FAIL !!!!      
      [Low4/64]Pat5(2,g)                R=+781.0  p~=   3e-463    FAIL !!!!!!!   
      [Low4/64]Pat5(4,r)                R= +56.7  p~=   5e-27     FAIL !!        
      [Low4/64]Pat5(4,g)                R=+259.2  p~=   6e-149    FAIL !!!!!     
      [Low4/64]Pat5(5,r)                R= +57.6  p~=   1e-27     FAIL !!        
      [Low4/64]Pat5(5,g)                R=+256.9  p~=   1e-147    FAIL !!!!!     
      [Low4/64]Pat5(6,r)                R= +58.8  p~=   2e-28     FAIL !!        
      [Low4/64]Pat5(6,g)                R=+272.6  p~=   5e-157    FAIL !!!!!     
      [Low4/64]Pat5(7,r)                R= +56.0  p~=   1e-26     FAIL !!        
      [Low4/64]Pat5(7,g)                R=+253.2  p~=   2e-145    FAIL !!!!!     
      ...and 763 test result(s) without anomalies
    
    rng=RNG_stdin64, seed=0xc53d1ca0
    length= 128 terabytes (2^47 bytes), time= 2161013 seconds
      Test Name                         Raw       Processed     Evaluation
      [Low4/64]Pat5(*,g)                R= +1316  p~=   6e-785    FAIL !!!!!!!   
      [Low4/64]Pat5(0,r)                R= -48.7  p~= 1-3e-22     FAIL !         
      [Low4/64]Pat5(0,g)                R=+147.4  p~=   1e-81     FAIL !!!!      
      [Low4/64]Pat5(1,r)                R= -47.6  p~= 1-1e-21     FAIL !         
      [Low4/64]Pat5(1,g)                R=+143.4  p~=   2e-79     FAIL !!!!      
      [Low4/64]Pat5(2,r)                R= -44.8  p~= 1-7e-20     FAIL !         
      [Low4/64]Pat5(2,g)                R=+134.6  p~=   6e-74     FAIL !!!!      
      [Low4/64]Pat5(4,r)                R= +38.9  p~=   2e-16     FAIL           
      [Low4/64]Pat5(4,g)                R=+100.2  p~=   3e-53     FAIL !!!!      
      [Low4/64]Pat5(5,r)                R= +38.8  p~=   2e-16     FAIL           
      [Low4/64]Pat5(5,g)                R= +96.5  p~=   5e-51     FAIL !!!       
      [Low4/64]Pat5(6,r)                R= +41.5  p~=   7e-18     FAIL           
      [Low4/64]Pat5(6,g)                R=+110.5  p~=   2e-59     FAIL !!!!      
      [Low4/64]Pat5(7,r)                R= +38.0  p~=   8e-16     FAIL           
      [Low4/64]Pat5(7,g)                R= +93.9  p~=   1e-49     FAIL !!!       
      ...and 779 test result(s) without anomalies
    

    Note that although I used my own implementation of sfc64, it is byte-for-byte identical in output to the built-in sfc64 for the same seed, a randomly chosen 0xdf7e5fd0802274af in this case.

    Prior to this point, I'd never seen anything fail the Pat5 tests and was mostly unaware of them. Interestingly sfc32 does not show this problem. It is currently fine out to 128 TB.

     
    • - 2018-08-23

      Interesting. Thanks for reporting it. I'd practically forgotten about that test.

       
    • Gregory Petrosyan

      FWIW, I've ran Pat5 test using PractRand pre0.95 on sfc64 10 times up to 128 TB, and did not observe any failures. I've also used the 0xdf7e5fd0802274af seed from your run, and it also passed up to 128 TB.

      Test output (and the patch I've used to only run the Pat5 test) is here: https://gist.github.com/flyingmutant/32a30cb09d130e1cd754ca3577f1d8b2

       
  • - 2018-09-07

    Changing "a + b + counter" to "(a + counter) ^ b" produced better avalanche scores in my testing, but worse output behavior on PractRand standard (from 512 TB down to 64 TB needed to show bias). Not going to make that change for the moment, though it may be worth investigating more later.

     
  • Michael Niedermayer

    Hi all
    Wasnt sure i should start a new thread but i guess as its about sfc64 this would reach people interrested. I was playing around first wondering how hard it would be to find the seeds from the output and then instead looking for patterns in the output based on my plan on how to find the seeds from the output.
    The shortest code i found that shows statistical defects in sfc64 i got after maybe an hour today of looking into it is this:

            uint64_t v1 = 0, v0 = 0, c = 0, a = 0;
            for (int i = 0; i<128; i++) {
                v0 = v1; v1 = sfc64_get(&sfc64);
                a = v1 - a;
                printf("%c%3"PRIX64, i%8 ? ' ' : '\n', a - 9*c >> 52);
                c = (c << 24 | c >> 40) + v0;
            }
    

    Output of this is:

    43F BE8 4E9 36A F81 2F4 275 8EA
    E1D C74 4EE 2D7 F84 2F2 279 978
    EAE D02 45F 2D5 F86 2F0 279 8E5
    EAF C74 3CD 2D7 F84 263 1E6 8EB
    EAC C75 3CC 2D5 F84 1D1 159 976
    E21 C70 3D0 242 EF7 1CF 15B 8E5
    D90 C6F 462 1B0 EF9 25B 15E 8E4
    D92 C6F 462 241 EFA 25D 15C 974
    D90 D00 3D0 241 EFA 25D  CC A04
    D92 CFE 3D1 241 EF7 25F  CC 977
    CFF D91 460 243 EF6 2F0  CC 975
    D8F E1F 3D3 1B0 EFA 25D 15D 974
    D8F E20 3CF 1B3 EF7 25E 15C 974
    E21 D8E 3D3 1B1 EF9 2EE 15E 972
    E23 D8D 3D3 240 EFA 2EB 15E 8E3
    D94 D8B 3D6 23E EFC 37A 160 8E2
    

    One can see the numbers are quite correlated within each column. I dont really think thats a problem as this test is really designed for sfc64 and not a general test for finding a statistical issue. And every (non secure and fast) PRNG likely has similar issues when one designs a test specifically for it. But it may be simply interresting in case this is not known already
    As a check to ensure the code doesnt always do that. With another random source we get:

    43F FD1 5EA 645 CDD  B3 22C ACE
    EB5 8A9 667 685 C5E B33 4A8 3E2
    8B3 95D 635 DB8 1B4 FEB FC2 F78
    45B EFB  33 B6D B17 FFB 4F5 2A5
    598 CFD F58 33E 7CF DFF 173 EEE
    941 D38 B08  74 95A 9F1 D7C AE3
    3F2  1F E1D 8AD 2CD 29E C1D CC5
    E26 34B 96B 57A EA2 6FF 21C CCB
    91D  78 1DD CD4 3CE ECC 62F 238
    35C 3E1 A2C 440 746 360 8CC 6EF
    802 37E BF3 956 8DB ED8 13C 528
    651 215 F53 C13 981 A28 96B 669
    767 B5D 807 B25 5A7 6B1 4BD  BF
    4BC F98 37D 772 182 5A3 51C F58
    AF0 69F 321 F4F 79E AC0 18F 908
    55E 92E 18D FAC 3E7 FFE BDA 668
    
     
    • - 2024-01-10

      Going from the output to the internal state looks quite feasible. Running the algorithm backwards should be even easier. The seed is simply the internal state 12 steps before output starts. There's probably some intelligent way to compute the state algebraically from the output, but if I had to do it, I'd just brute force it - guess a random value of 'c', use the output to calculate later values of 'c' from the guessed initial value, use the calculated values of c to calculate what the 3rd & 4th & 5th outputs should be, use the hamming distance between the predicted and observed outputs as a score, flip a random bit or two to my guessed initial value of 'c' and check if the score improved, revert if the score got worse, repeat until the score reaches zero. That should get you the full values of 'c' and 'counter' within a few billion cycles, at which point solving for the other two unknowns gets much much simpler.

      Your code looks like a distinguishing attack against sfc64. But since sfc64 isn't intended for cryptographic use, distinguishing attacks doesn't really mean much. If you could do that against efiix or salsa or somesuch that would be something, but sfc, xsm, jsf, even arbee should all be highly vulnerable to that kind of thing.

       
      • Michael Niedermayer

        Well, i dont "have to" do it, iam just playing with this for fun. Brute force is a bit boring. So heres my implementation of finding the seeds from 10+ output values and probably too terse description of how that works. But bascially it iterates over the 17-19 MSB of a compute the corresponding MSB of b and c from that (c requires us to explore multiple cases) and then move around so we split the now "known" part of c so the lower part becomes the MSB of c and then check that. That allows us to discard mismatching cases and ATM requires a few million steps to find the solution which is a fraction of a second for my testcase.

         
  • IgneousRed

    IgneousRed - 4 days ago

    On my Mac M2 Pro incrementing by random odd number (instead of 1) speeds up SFC by almost 10%.
    Found this to be true also on my PRNGs.

     
    • - 4 days ago

      Yeah, performance of high-end CPUs on short loops has been weird ever since... the early to mid 1990s I think? PPros. And these days it's not just high-end CPUs anymore.

      sfc is sort of optimized for x86. Not much really, but the value of the left-shift was chosen, at least in part, to work well with the LEA instruction on x86.

       

Log in to post a comment.