JFlex / Bugs / #130 Re-enable scanning interactively or from a network byte stream

Steve Rowe - 2014-08-18

I put that while loop in place when I was making changes to the skeleton to enable supplementary character handling (chars above the Unicode basic multilingual plane). I had forgotten about the interactive/network byte stream use cases (I've never used JFlex in those ways myself).

I've removed the while loop, and put back in place the previous code to perform a single retry if zero characters are read.

The changes are in r852 (for trunk; I'll merge the changes into the 1.6.X branch after I create it later today). This change will be part of the 1.6.1 JFlex release.

Thanks Ronald!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steve Rowe - 2014-08-18

status: open --> open-fixed

assigned_to: Steve Rowe
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steve Rowe - 2014-08-18

summary: configurable read strategy --> Re-enable scanning interactively or from a network byte stream
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dawid Weiss - 2014-09-18

I think the solution in the code is wrong though -- there should be a while loop running until there's either an EOF or at least one byte received from the input. The single 'if' retry is not enough (if the stream can return 0 as the result, it can do it twice, but still have lingering input).

Patch attached.

0notateol.diff

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steve Rowe - 2014-09-18

Ronald, will the changes in Dawid's patch work for your interactive scanner?

Also, we don't have an interactive scanner test case under testsuite/testcases/src/test/cases/ - would you be willing to contribute one? It's hard for me to evaluate changes like this without a test.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dawid Weiss - 2014-09-18

Come on, it's not that difficult :) It's actually a typical error in handling stream reads -- read(...) returning 0 is not an EOF and should simply retry.

What my patch does will work for Ronald assuming his input reader is not doing any internal buffering (which it shouldn't if it's a simple socket-based read). The patch simply corrects the current behavior in which the stream returning 0 twice will cause the handler to return EOF (which isn't the case).

I would add a test case... if they were in some regular format, but I'm kind of lost in the current testing harness :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dawid Weiss - 2014-09-18

I've put together a stand-alone test case, hope you can integrate it with jflex's architecture (I tried, but failed).

to compile, run:

java -jar jflex-1.7.0-SNAPSHOT.jar refill.flex && javac *.java && java -cp . Refill

The current code prints (incorrectly):

Reading "refill.flex"
Constructing NFA : 6 states in NFA
Converting NFA to DFA :
..
4 states before minimization, 3 states in minimized DFA
Old file "Echo.java" saved as "Echo.java~"
Writing code to "Echo.java"
<start>
<end>

while after patching it prints:

Reading "refill.flex"
Constructing NFA : 6 states in NFA
Converting NFA to DFA :
..
4 states before minimization, 3 states in minimized DFA
Old file "Echo.java" saved as "Echo.java~"
Writing code to "Echo.java"
<start>
Letter received: a
Letter received: b
Letter received: c
<end>

The reads are not buffered as you can clearly see because I added an explicit 1-second interval between each parsed letter.

refill-read-zero.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ronald - 2014-09-19

Hi Steve, hi Dawid,

Dawid's change seems absolutely correct to me. In fact it covers the case I already mentioned in my first post:

If the assumption that a read() returns at least one character (or EOF) isn't valid, the while loop must be executed as long as nothing is read.

And I guess that the general case is such that anything can happen, including a read() returning after having read nothing (though that will the exception rather than the rule). The code I tested was the original code having "requested" in the while condition replaced by the literal 1. (see above).
Though now expressed slightly differently, the actual code is semantically the same.

I regard Dawid's testcase absolutely convincing and wouldn't know how to improve the test.

I'll test our software with the latest jflex and Dawid's patch this weekend.
If I encounter any problems, I'll repair it and send a patch. If not, I'll report success.

Last edit: ronald 2014-09-19

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dawid Weiss - 2014-09-19

including a read() returning after having read nothing

This has happened to me a few times in the past, actually. It's possible when you have, for example, filters over your actual I/O stream -- then the filtering stream can eat up some of the buffer and return zero bytes consecutively, without asking for more content.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ronald - 2014-09-19

In my opinion a blocking reader should always return at least 1 item or an EOF. A blocking reader returning 0 items looks like a design flaw to me. Nevertheless, jflex wants to handle abstract data sources and returning 0 items can be a property of such an abstract source as you pointed out.
In fact, the current code (with the while() instead of the if()) is even capable of handling non-blocking readers, although it would be an expensive way of heating the room ;-)
PS. please excuse me for initially misspelling your name

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dawid Weiss - 2014-09-19

Yeah, unfortunately this isn't an ideal world in which everything works as advertised... It's not even that there are many non-blocking (spin-looping) readers out there; it's just that it does happen with third party code that occasionally such situations do happen. Better safe than sorry.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ronald - 2014-09-23

The good news is that the interactive scanner seems to work (both with "if" and "while").
The bad news is that it runs into some other flaw which doesn't occur using jflex 1.4.x. I'm still investigating the situation. To point at the direction, there's a difference between

echo "list sessions;" | sdmsh

and

echo "list sessions;" > /tmp/x
sdmsh < /tmp/x

as the first commandline hangs (wrong), the second terminates (correct).
There shouldn't be a difference in behaviour though.

As soon as I've found the cause, I'll drop a note (my mistake) or report a bug (jflex mistake).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dawid Weiss - 2014-09-23

Dump the stack trace from the hung process, Ronald. It's very odd as technically a pipe and a shell redirect are not possible to be distinguished from Java level.

As for the 'if' vs. 'while' -- only two options are correct, really:
- strict mode: throw an exception on the input stream returning zero (not conforming to contract),
- lenient mode: on zero returned from read() retry until EOF or some bytes are returned from stream. you could yield() or wait, but I don't think it makes sense (typically there should be SOME I/O at the end of the readers chain anyway).

The double 'if' idiom is incorrect (given 0-returning input), as my simple example showed. That it works for you is accidental (apparently your input doesn't return 0 twice in a sequence, but it just as well could).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ronald - 2014-09-23

Hi Dawid,

I've already found the bug which is in skeleton.nested (see bug #132). Better said I found out how to repair it :-)

I fully agree with you regarding if vs while. I tested both if and while, and both work because the base java classes (unfiltered, so to say) obviously conform to the contract.

The lenient mode will be better. In the normal case there's no difference in executed code (the read will return at least one byte or an EOF). But it is the robust implementation.

In the diff of the skeleton.nested in bug 132, you can see I replaced 'if' with 'while'.

Regarding the stack traces. That's a bit hard to get. If I redirect the file, everything's over before I can even start looking for a pid. In the other case, zzrefill() is waiting in a read().

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dawid Weiss - 2014-09-23

I don't know anything about the .nested skeleton, it looks like an odd concept to me (although I can see some benefits for people who want to dynamically include stuff while preserving the parser state).

Thanks for clarifying.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ronald - 2014-09-23

Well, you might or might not know mysql (as an example).
if you use mysql interactively (that can be a script that is piped into the tool of course), you can

source somefile.sql

which then executes the statements contained in that file. (This is true for about every sql command line frontend I know, which are quite a few).
The same is true for the command line interface to our scheduling system.

In order to do so, you'd have to save the internal state of the scanner and save the buffer. You can't do that without help of jflex. Hence the "nested" skeleton.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dawid Weiss - 2014-09-23

I do know mysql :)

What I meant is that typically when doing parser includes you'd restart nested new parsing context instead of reusing the same parser for the include... unless you really want the inclusion to support partial state evaluation, etc.

I'm not arguing to remove it, of course -- different things for different folks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gerwin Klein - 2015-02-15

This should be fixed in rev 852, issue now tracked on https://github.com/jflex-de/jflex/issues/131

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Re-enable scanning interactively or from a network byte stream

The fast lexer generator for Java

Group

Searches

Help

#130 Re-enable scanning interactively or from a network byte stream

Discussion