Menu

piping to PractRand - matching data rates

rossd
2017-09-13
2022-04-01
  • rossd

    rossd - 2017-09-13

    I want to use PractRand to analyse some of my simulation data, but I belong to a silent group of potential users who don't use C/C++ or do any system programming, so it's a bit hard to get started. I managed to build PractRand by blindly following following MartyMacGyver's build lines, but I'd like to see the basic build info for C++ novices put up on this site please. And where the F is the detailed user manual for gcc?
    Nevermind, what I meant to ask about was the piping of data to PractRand in a DOS box by using type <filename> | RNG_test stdin. How are the data rates matched up? ie, the rate at which the TYPE command pushes the data into stdin, versus the rate at which RNG_test can pick it up and analyse it? Is there a rate control based on "send" requests from the destination process? That didn't sound right, because I had a vague memory from DOS days that if there is no destination it gets sent anyway at a hardware-dependent rate, and if that's so you can't have 1TB of data just piling up.
    And if I use a program like Matlab to run a loop pushing data onto stdout with fwrite, can that be picked up by RNG_test? I can't make any of it work so far.

    Thanks

     
  • - 2017-09-13

    http://pracrand.sourceforge.net/installation.txt is supposed to give some degree of guidance to novice users trying to build PractRand. I used to include windows binaries in the download, but the download size got a bit big so I trimmed out things like that.

    RNG_test, in single-threaded mode, can read in about 4 gigabytes per minute at normal settings, assuming nothing else is hogging all the CPU cores. When the pipe runs out of buffered data, PractRand will block (go to sleep) until more data is added to the pipe. If PractRand can't keep up then normally the source on the other end up the pipe will block (go to sleep) until PractRand catches up. A quick googling says that the buffer size for pipes on modern Linux is 64 kilobytes. I'd guess other OSes would be comparable, certainly at least 4 KB (one page, on x86/x64), probably 16 KB or more.

    Note, however, that if you're trying to measure the information content of a highly non-random stream, you might be better off using a file compression program instead of PractRand.

    edit:

    And if I use a program like Matlab to run a loop pushing data onto stdout with fwrite, can that be picked up by RNG_test? I can't make any of it work so far.

    Yes, that's exactly what the "stdin" PRNG was intended for. Note however that PractRand won't test small amounts of data, 1 KB is the smallest it's capable of. You might try the -tlmin command line option to make sure that PractRand gives output as soon as possible in case it's running out of data before presenting results.

     

    Last edit: 2017-09-13
    • rossd

      rossd - 2017-09-13

      http://pracrand.sourceforge.net/installation.txt is supposed to give some degree of guidance to novice users trying to build PractRand.

      Um, yes, I read the file and it said:
      On linux I build the PractRand library using these command lines:
      ..........
      And then I build the PractRand command line tools like this:
      ................

      I was waiting for the following para which would read, "On Windows you would do this instead .... ", but it never came. So I thought, ah, here's yet another 'nix person who delights in giving all the 'nix and Linux details, then says you can also do it on Windows if you really must, but don't expect any help. On re-reading, I can see you didn't mean it that way.

      I used to include windows binaries in the download, but the download size got a bit big so I trimmed out things like that.

      And you've only done it recently it seems, because I see there were .exe files with v0.92, and the DL was 22 MB. Is that an upload cost for you? It's not much of a download size these days - that's only 1/10 of a YouTube video, and the kids will download 15 before breakfast.

      So I have it working now, except that I am getting some funny results. If I use Matlab's mt19937ar implementation and do a binary write of the MT output as uint32 values, it fails PractRad very early - about 4MB. Even sooner if the output class is smaller. The MT algorithm is meant to do better than that. You would think that Matlab/The Mathworks would get the implementation right, it is a big product and a well funded company, so I don't know what to think.

      Is there a list of command line options somewhere? I saw -tlmin in the test overview, and several others scattered through the versions file: but maybe you list them as a group somewhere? I hope I haven't missed a file.

      thanks

       

      Last edit: rossd 2017-09-13
      • - 2017-09-13

        If you're on Windows, you're probably using either MSVC or some form of gnu-on-windows.

        If you're using MSVC, you can open up the PractRand MSVC project/solution up with MSVC and tell it to build and it should just build. Hopefully. The was supposed to be conveyed by the line that read "It comes with an MSVC project file".
        If you're using some form of gnu-on-windows then the build process should be roughly the same as on linux. Probably.

        The file size wasn't a big issue, but... I forget why the change happened. I remember that 90% of the download was just one of MSVCs temp files, something to do with the combination of link-time-code-generation optimizations combined with STL strings. And someone complained.

        Anyway...

        So I have it working now, except that I am getting some funny results. If I use Matlab's mt19937ar implementation and do a binary write of the MT output as uint32 values, it fails PractRad very early - about 4MB. Even sooner if the output class is smaller. The MT algorithm is meant to do better than that. You would think that Matlab/The Mathworks would get the implementation right, it is a big product and a well funded company, so I don't know what to think.

        That does sound pretty wrong. My first thought is CR/LF conversion issues, but you did say binary write so that probably shouldn't happen, and even if it did it shouldn't depend upon the output class. Hm... googling matlab documentation... hm... you're requesting mt19937ar output via randi, right? Any chance you're getting the bounds of the requested numbers wrong? In particular, it looks like it prefers to default to 1..N where N is the parameter passed to it, while the uint32 conversion demands that the value be in the range 0..4294967295 (that's 2 to the 32nd power minus 1). If you passed it a number in the range 1..4294967296 it would fail PractRand, and... no, that would take more than 4 MB to detect. I dunno. mt19937ar output should fail the PractRand standard test battery after 256 gigabytes of output, via a binary matrix rank test. There are a lot of ways mangle the data en route to PractRand (CR/LF conversion, floating point loss of precision, binary/ASCII confusion, sign bit mishandling, etc), but most of the ones I can think of would either fail way faster than that or way slower than that.

         
        • rossd

          rossd - 2017-09-13

          Thanks for your comments. Things still not right, but maybe a good sleep will fix it - often does!
          Currently I am finding that
          (a) using the internal generators works fine
          (b) saving a binary uint32 data file from Matlab's MT generator fails at 4MB, using either {file} | RNG_test stdin32 -tlmin 10KB, or RNG_test stdin32 -tlmin 10KB <{file}
          (c) writing the data to stdout is not picked up by RNG_test at all.
          My test array was created by randi as you have guessed, and I have checked it has the correct data range [0, 2^32-1]. Tried different seed in the MT generator, & using RAND instead of RANDI, but still fails at 4MB each time. .

          However I hope it will come good soon. It is good to have PractRand as it's a long time since there has been much activity in the area of PRNG testing. I gave up trying to get TestU01 to build on Windows, and the NIST suite was awful (still is: I don't think they've made any attempt to improve it for years - feels like at least 10-15 yrs?).

          thanks

           
  • - 2017-09-13

    Is there a list of command line options somewhere? I saw -tlmin in the test overview, and several others scattered through the versions file: but maybe you list them as a group somewhere? I hope I haven't missed a file.

    Try running RNG_test without parameters, or with the -help parameter.

     

    Last edit: 2017-09-14
  • - 2017-09-14

    One thing you could do to make sure input isn't getting mangled on PractRand's end is try:
    RNG_output mt19937 inf | RNG_test stdin

    I know it doesn't get mangled for me ony windows or linux on any remotely recent version, but it's something easy to test anyway.

    What do the failures you see on mt19937ar look like?

     

    Last edit: 2017-09-14
  • rossd

    rossd - 2017-09-16

    Getting nowhere with this. I tried the mt19937 generator built in to PractRand, as you suggested above - that was working normally. However I am unable to test any of my own data via any of the 3 input methods: type [datafile] | RNG_test stdin32, RNG_test stdin32 <[datafile], or RNG_test stdin32 then start writing to stdout with my Matlab program. The effects of trying these are:
    (1) The first 2 cause an immediate Windows system popup saying that "RNG_test.exe has stopped working". If I add the -tlmin 10KB option then in both cases it gets to 4MB and stops with a message from RNG_test instead, due to failures simultaneously on multiple tests, and returns to the command prompt by itself without causing a Windows pop-up. The messages from the 2 methods are identical and always occur at 4MB. The failure panel looks like this:

    rng=RNG_stdin, seed=0x17d662b7
    length= 4 megabytes (2^22 bytes), time= 4.4 seconds
    Test Name Raw Processed Evaluation
    DC6-9x1Bytes-1 R= +15.1 p = 2.9e-8 VERY SUSPICIOUS
    Gap-16:A R= +10.3 p = 7.5e-8 very suspicious
    FPF-14+6/16:all R= +10.2 p = 9.0e-9 VERY SUSPICIOUS
    FPF-14+6/16:all2 R= +17.0 p = 2.1e-7 very suspicious
    [Low1/8]DC6-9x1Bytes-1 R= +11.5 p = 2.0e-6 very suspicious
    [Low1/8]Gap-16:A R= +7.7 p = 4.3e-6 suspicious
    [Low1/8]Gap-16:B R= +7.6 p = 6.2e-6 suspicious
    [Low4/32]DC6-9x1Bytes-1 R= +8.4 p = 1.3e-4 mildly suspicious
    [Low4/32]Gap-16:A R= +7.4 p = 7.5e-6 suspicious
    [Low4/32]Gap-16:B R= +5.5 p = 2.5e-4 unusual
    [Low1/32]Gap-16:A R=+210.0 p = 8.0e-163 FAIL !!!!!
    [Low1/32]Gap-16:B R=+220.7 p = 3.9e-169 FAIL !!!!!!
    ...and 87 test result(s) without anomalies

    (2) If I start RNG_test looking at stdin then start writing to stdout from Matlab, there is no data transfer and RNG_test just sits there quietly, apparently receiving nothing, does not close of its own accord when the sending program is closed. I have shut down all other processes apart from Win Perf Monitor, which confirms CPU usage idling at only 2%, no disk i/o, no network i/o. Platform is Win 7/Intel i7 Ivybridge/ASUS. I rebuilt the tool exes, discovering in the process that GCC from some packages like TDM-GCC won't work to build the PractRand sources, had to use the MinGW_x64_7-1-0 package and set X64/Posix/seh switches on installation.
    All very annoying - still unable to test any of my own data files or generators (which are not in C/C++), and I don't want to go there unless I really have to).

     

    Last edit: rossd 2017-09-16
    • G. Jones

      G. Jones - 2017-09-18

      re: the first 2 methods. Looks to me like it worked. The ouput ending "and 87 test result(s)" etc is what you get from the test program. RNG_test thinks your prng is complete garbage. There are two possible reasons for this. (a) It's complete garbage. (b) You're output is in the wrong format. Some sort of text format maybe, where RNG_test wants pure binary with 8 random bits per byte.

      re: the third method. Read your Matlab documentation carefully, and maybe search the net. You're looking for "pipe", or maybe "pipe output to another program". Just writing to stdout while RNG_test reads from stdin is not going to work unless the two are actually connected via a pipe. MS-Windows has had pipes for decades, the question is whether Matlab knows any way to use them.

      Gnu Octave (which is partly compatible with matlab) has a feature called "popen" which probably does what you want. Not sure if this is in Matlab though.

       
    • - 2017-09-18

      Those failures do not look like a normal issue to me. Too many tests are failing at once, and some tests, such as [Low1/32]Gap-16:A and B are failing amazingly badly for something that even barely passed 2 megabytes. The most common situation to produce extreme results like that is cycle exhaustion, where the PRNG suddenly started repeating itself, but that seems unlikely to be an issue in your case. Is there anything going on in your output code that could suddenly radically change behavior after somewhere around 2 megabytes of output?

       
  • Munther

    Munther - 2022-04-01

    Is there a ready executable file for Windows for PractRand to download

     

Log in to post a comment.