Menu

#8 Issues with new buffered stdin

v1.0_(example)
closed-fixed
nobody
None
5
2014-11-26
2014-11-19
Jirka
No

Hi,

the buffered stdin needs some minor fixes. For example, when supplying 129 MiB of data, RNG_test is crashing:

RNG_output jsf64 135266304 | RNG_test stdin

RNG = RNG_stdin, PractRand version 0.92, seed = 0x18ab74f8
test set = normal, folding = standard(unknown format)

rng=RNG_stdin, seed=0x18ab74f8
length= 128 megabytes (2^27 bytes), time= 2.3 seconds
Test Name Raw Processed Evaluation
[Low1/8]FPF-14+6/16:cross R= -2.1 p =1-1.1e-3 unusual
...and 150 test result(s) without anomalies

Segmentation fault (core dumped)

Looking into RNG_from_name.h I see several issues:

  1. "bool ended" is set but never used
  2. following code has several issues
Word read() { Word rv = *(pos++); if (pos == end) refill(); return rv; }
  • Issues
    • since fread can return any number of bytes we should check if we have enough data in buffer first. Example. end == 1 and we are trying to get 4 Bytes.
    • for the same reason, it can happen that pos > end. refill will be never called.
  • I also strongly advocate to improve end of stream handling (code after if (n < BUFF_SIZE) )
    • Check whether EOF has been reached or if some error has occured
    • In case of EOF, inform user, that EOF condition has been reached. Process the remaining data and finish application in some controlled manner.
    • In case that ferror has occured, print out the error message. Process the remaining data and finish application in some controlled manner.

Thanks!
Jirka

Discussion

  • - 2014-11-19

    It's an error that the ended flag wasn't checked properly, and likewise that it wasn't declaring an error when fread returns zero. I'll fix that for 0.93. For reference, the way it was intended to go (and thus the fix I'm using now that you've pointed it out that I didn't finish that code):

    std::size_t n = std::fread(&buffer[0], sizeof(Word), BUFF_SIZE, stdin);
    if (n < BUFF_SIZE) ended = true;

    gets changed to this:

    if (ended) read_failed();
    std::size_t n = std::fread(&buffer[0], sizeof(Word), BUFF_SIZE, stdin);
    if (!n) read_failed();
    if (n < BUFF_SIZE) ended = true;

    The other stuff...

    since fread can return any number of bytes we should check if we have enough data in buffer first. Example. end == 1 and we are trying to get 4 Bytes.

    This doesn't make sense. fread can't return any number of bytes - it can only return multiples of the unit specified in its parameter. If some uses RNG_stdin, and it buffers up a single byte, and they then call RNG_stdin::raw32(), it will call _stdin_reader<uint8>::read() 4 times, and the 2nd one should call read_failed() (though admittedly, it's not being called in 0.92...), aborting the program. That is the desired behavior. </uint8>

    for the same reason, it can happen that pos > end. refill will be never called.

    The only way that can happen is if fread returned no data... which should have (but didn't in .92) already called read_failed() at that point.

    Check whether EOF has been reached or if some error has occured

    In case of EOF, inform user, that EOF condition has been reached. Process the remaining data and finish application in some controlled manner.

    What error do you see occurring other than an EOF? I'm having a hard time picturing an error reading from stdin that I would care enough about to want more information.

    Process the remaining data? RNG_stdin* is already returning the remaining data until the last integer it can read from input. If you mean the tests... they can't conceive of data in units smaller than 1 kilobyte, and I see no reason to change that. If RNG_test were to pay more attention to what goes on inside of RNG_stdin it could generate an interim result after the last full kilobyte of input and display that... but I don't really see the point. If the user has a special file size they like to handle, they can specify that with a command line option and get an interim result at the end. I don't see any reason to make RNG_test care about file input to make it automatically generate interim results at the maximum possible point for limited data scenarios.

     
  • Jirka

    Jirka - 2014-11-19

    Hi,

    I agree with your fix. Some my points were silly, I should not post any messages so late in the night. Sorry about that.

    I have implemented your fix with two minor modifications. Please consider adding those to the 0.93 release.

    1. I keep track of total number of blocks read by fread
    2. read_failed function will inform user
      1. about reasons why stdin has failed - either EOF or ferror condition. (For possible error conditions on read see http://pubs.opengroup.org/onlinepubs/009695399/functions/fgetc.html And using ferror is recommended by the documentation, see http://en.cppreference.com/w/cpp/io/c/feof )
      2. Info about total bytes read in human readable format

    I really like when application informs me why it has ended. It especially holds for long running programs like RNG_test which can run for days and weeks. You really want these programs to be verbose about the reasons why they have ended.

    Fixed RNG_from_name.h is included.Please consider it for 0.93 release.

    Thanks
    Jirka

     
  • - 2014-11-20

    Thanks. I'll stick it in, with minor modifications:
    changed identitation to 1 tab per indentation level, to match PractRand code
    initialized blocks_read to zero on construction
    removed line feed from end of perror string
    changed "B" to "bytes" in short length message

    I'm assuming the only changes were to _stdin_reader.

     
  • Jirka

    Jirka - 2014-11-20

    Thanks including it!

    And yes, I have only modified _stdin_reader class.

     
  • - 2014-11-26
    • status: open --> closed-fixed
     

Log in to post a comment.

MongoDB Logo MongoDB