Menu

#130 Re-enable scanning interactively or from a network byte stream

feature request
open-fixed
None
5
2015-02-15
2014-08-05
ronald
No

Hi,

We have an application that receives commands over a TCP/IP socket. After each command a reply is sent. Normally the communication is interactive, which means a strict order of questions and answers. This works perfectly with jflex 1.4.x. (The grammar is easy from the jflex point of view. Every command is terminated by a semicolon, no read ahead necessary; this follows your FAQ answer on "I want my scanner to read from a network byte stream or from interactive stdin. Can I do this with JFlex?").

Now we have some problems with Jflex 1.6.0.

Jflex generates following code in zzRefill():

...
    int requested = zzBuffer.length - zzEndRead;
    int totalRead = 0;
    while (totalRead < requested) {
      int numRead = zzReader.read(zzBuffer, zzEndRead + totalRead, requested - totalRead);
      if (numRead == -1) {
        break;
      }
      totalRead += numRead;
    }
...

This code reads bytes from the input as long as the buffer isn't full or until EOF is reached. This is nice in case of reading a file, but leads to deadlocks if using an interactive scanner.

The easiest way to avoid the deadlock is to eliminate the while loop. If the assumption that a read() returns at least one character (or EOF) isn't valid, the while loop must be executed as long as nothing is read. (I tested this; it seems to work flawlessly).

Since I assume there was a good reason for this piece of code, I'd like to have a command line option to eliminate the while loop here.
I'd be delighted if someone would explain me why a repeated call of zzRefill() is worse than the while loop. It can't really be a performance issue. The costs of an I/O exceed by far the costs of a function call and a bit of basic arithmetic.

Regards,

Ronald

Discussion

  • Steve Rowe

    Steve Rowe - 2014-08-18

    I put that while loop in place when I was making changes to the skeleton to enable supplementary character handling (chars above the Unicode basic multilingual plane). I had forgotten about the interactive/network byte stream use cases (I've never used JFlex in those ways myself).

    I've removed the while loop, and put back in place the previous code to perform a single retry if zero characters are read.

    The changes are in r852 (for trunk; I'll merge the changes into the 1.6.X branch after I create it later today). This change will be part of the 1.6.1 JFlex release.

    Thanks Ronald!

     
  • Steve Rowe

    Steve Rowe - 2014-08-18
    • status: open --> open-fixed
    • assigned_to: Steve Rowe
     
  • Steve Rowe

    Steve Rowe - 2014-08-18
    • summary: configurable read strategy --> Re-enable scanning interactively or from a network byte stream
     
  • Dawid Weiss

    Dawid Weiss - 2014-09-18

    I think the solution in the code is wrong though -- there should be a while loop running until there's either an EOF or at least one byte received from the input. The single 'if' retry is not enough (if the stream can return 0 as the result, it can do it twice, but still have lingering input).

    Patch attached.

     
  • Steve Rowe

    Steve Rowe - 2014-09-18

    Ronald, will the changes in Dawid's patch work for your interactive scanner?

    Also, we don't have an interactive scanner test case under testsuite/testcases/src/test/cases/ - would you be willing to contribute one? It's hard for me to evaluate changes like this without a test.

     
  • Dawid Weiss

    Dawid Weiss - 2014-09-18

    Come on, it's not that difficult :) It's actually a typical error in handling stream reads -- read(...) returning 0 is not an EOF and should simply retry.

    What my patch does will work for Ronald assuming his input reader is not doing any internal buffering (which it shouldn't if it's a simple socket-based read). The patch simply corrects the current behavior in which the stream returning 0 twice will cause the handler to return EOF (which isn't the case).

    I would add a test case... if they were in some regular format, but I'm kind of lost in the current testing harness :)

     
  • Dawid Weiss

    Dawid Weiss - 2014-09-18

    I've put together a stand-alone test case, hope you can integrate it with jflex's architecture (I tried, but failed).

    to compile, run:

    java -jar jflex-1.7.0-SNAPSHOT.jar refill.flex && javac *.java && java -cp . Refill

    The current code prints (incorrectly):

    Reading "refill.flex"
    Constructing NFA : 6 states in NFA
    Converting NFA to DFA :
    ..
    4 states before minimization, 3 states in minimized DFA
    Old file "Echo.java" saved as "Echo.java~"
    Writing code to "Echo.java"
    <start>
    <end>

    while after patching it prints:

    Reading "refill.flex"
    Constructing NFA : 6 states in NFA
    Converting NFA to DFA :
    ..
    4 states before minimization, 3 states in minimized DFA
    Old file "Echo.java" saved as "Echo.java~"
    Writing code to "Echo.java"
    <start>
    Letter received: a
    Letter received: b
    Letter received: c
    <end>

    The reads are not buffered as you can clearly see because I added an explicit 1-second interval between each parsed letter.

     
  • ronald

    ronald - 2014-09-19

    Hi Steve, hi Dawid,

    Dawid's change seems absolutely correct to me. In fact it covers the case I already mentioned in my first post:

    If the assumption that a read() returns at least one character (or EOF) isn't valid, the while loop must be executed as long as nothing is read.

    And I guess that the general case is such that anything can happen, including a read() returning after having read nothing (though that will the exception rather than the rule). The code I tested was the original code having "requested" in the while condition replaced by the literal 1. (see above).
    Though now expressed slightly differently, the actual code is semantically the same.

    I regard Dawid's testcase absolutely convincing and wouldn't know how to improve the test.

    I'll test our software with the latest jflex and Dawid's patch this weekend.
    If I encounter any problems, I'll repair it and send a patch. If not, I'll report success.

     

    Last edit: ronald 2014-09-19
  • Dawid Weiss

    Dawid Weiss - 2014-09-19

    including a read() returning after having read nothing

    This has happened to me a few times in the past, actually. It's possible when you have, for example, filters over your actual I/O stream -- then the filtering stream can eat up some of the buffer and return zero bytes consecutively, without asking for more content.

     
  • ronald

    ronald - 2014-09-19

    In my opinion a blocking reader should always return at least 1 item or an EOF. A blocking reader returning 0 items looks like a design flaw to me. Nevertheless, jflex wants to handle abstract data sources and returning 0 items can be a property of such an abstract source as you pointed out.
    In fact, the current code (with the while() instead of the if()) is even capable of handling non-blocking readers, although it would be an expensive way of heating the room ;-)
    PS. please excuse me for initially misspelling your name

     
  • Dawid Weiss

    Dawid Weiss - 2014-09-19

    Yeah, unfortunately this isn't an ideal world in which everything works as advertised... It's not even that there are many non-blocking (spin-looping) readers out there; it's just that it does happen with third party code that occasionally such situations do happen. Better safe than sorry.

     
  • ronald

    ronald - 2014-09-23

    The good news is that the interactive scanner seems to work (both with "if" and "while").
    The bad news is that it runs into some other flaw which doesn't occur using jflex 1.4.x. I'm still investigating the situation. To point at the direction, there's a difference between

    echo "list sessions;" | sdmsh

    and

    echo "list sessions;" > /tmp/x
    sdmsh < /tmp/x

    as the first commandline hangs (wrong), the second terminates (correct).
    There shouldn't be a difference in behaviour though.

    As soon as I've found the cause, I'll drop a note (my mistake) or report a bug (jflex mistake).

     
  • Dawid Weiss

    Dawid Weiss - 2014-09-23

    Dump the stack trace from the hung process, Ronald. It's very odd as technically a pipe and a shell redirect are not possible to be distinguished from Java level.

    As for the 'if' vs. 'while' -- only two options are correct, really:
    - strict mode: throw an exception on the input stream returning zero (not conforming to contract),
    - lenient mode: on zero returned from read() retry until EOF or some bytes are returned from stream. you could yield() or wait, but I don't think it makes sense (typically there should be SOME I/O at the end of the readers chain anyway).

    The double 'if' idiom is incorrect (given 0-returning input), as my simple example showed. That it works for you is accidental (apparently your input doesn't return 0 twice in a sequence, but it just as well could).

     
  • ronald

    ronald - 2014-09-23

    Hi Dawid,

    I've already found the bug which is in skeleton.nested (see bug #132). Better said I found out how to repair it :-)

    I fully agree with you regarding if vs while. I tested both if and while, and both work because the base java classes (unfiltered, so to say) obviously conform to the contract.

    The lenient mode will be better. In the normal case there's no difference in executed code (the read will return at least one byte or an EOF). But it is the robust implementation.

    In the diff of the skeleton.nested in bug 132, you can see I replaced 'if' with 'while'.

    Regarding the stack traces. That's a bit hard to get. If I redirect the file, everything's over before I can even start looking for a pid. In the other case, zzrefill() is waiting in a read().

     
  • Dawid Weiss

    Dawid Weiss - 2014-09-23

    I don't know anything about the .nested skeleton, it looks like an odd concept to me (although I can see some benefits for people who want to dynamically include stuff while preserving the parser state).

    Thanks for clarifying.

     
  • ronald

    ronald - 2014-09-23

    Well, you might or might not know mysql (as an example).
    if you use mysql interactively (that can be a script that is piped into the tool of course), you can

    source somefile.sql

    which then executes the statements contained in that file. (This is true for about every sql command line frontend I know, which are quite a few).
    The same is true for the command line interface to our scheduling system.

    In order to do so, you'd have to save the internal state of the scanner and save the buffer. You can't do that without help of jflex. Hence the "nested" skeleton.

     
  • Dawid Weiss

    Dawid Weiss - 2014-09-23

    I do know mysql :)

    What I meant is that typically when doing parser includes you'd restart nested new parsing context instead of reusing the same parser for the include... unless you really want the inclusion to support partial state evaluation, etc.

    I'm not arguing to remove it, of course -- different things for different folks.

     
  • Gerwin Klein

    Gerwin Klein - 2015-02-15

    This should be fixed in rev 852, issue now tracked on https://github.com/jflex-de/jflex/issues/131