Resolving Lexical Ambiguity for REXX Strings

2003-04-18
2003-04-19
  • As much as I love REXX, even I have to admit that,
    as a language, it has a couple of warts.  One of these derives from the juxtaposition operation (wherein two adjacent operands are implicitly concatenated).  For example, the instructions:

    f = 4; SAY '3031'f

    results in output of "30314", because the expression is treated as if it were '3031' || f

    A problem arises when the character which follows the closing string quote is X or B, as these can indicate that the preceding string is hexadecimal or binary, respectively.

    However, when the string is followed by X or B and other characters, there is an issue of determining whether or not the X or B is part of the string token, or part of whatever follows.  Take for example, the following instructions, and the output Regina produces for them under Windows on a PC:

    SAY '3031'X  /* produces "01" */

    DROP XA; SAY '3031'XA /* produces "3031XA" */

    DROP X4; SAY '3031'X4 /* produces "014" */

    So, my first question is this:  why should Regina choose to treat '3031'XA as ( '3031' || XA ), but choose to treat '3031'X4 as ('3031'X || 4) ??

    I have been unable to find a definitive statement of how to handle this situation; although in "The REXX Language:  A Practical Approach to Programming", Mike Colishaw says, "The X may not be part of a longer token".  (BTW I can no longer find the ANSI draft REXX standard online, and would appreciate it if anyone could supply a URL).

    The interpretation that X1 above is a "longer token" as described by Colishaw is borne out by the IBM implementations of REXX, both for TSO/E and DOS7 (I haven't gotten around to checking any of the others).

    My second question is this:  Could it be that Regina's lexical routines should adopt the interpretation that if the X or B is followed by any other characters valid in a symbol (A-Z, 0-9, _, etc), that the X or B is *not* the radix of the string?

     
    • I neglected to show you what the IBM interpreters I mentioned produce for the examples given.  Here it is (these examples are for the DOS7 interpreter, but the MVS&TSO/E interpreter operates in the same fashion, just with EBCDIC instead of ASCII):

      SAY '3031'X /* produces "01" */

      DROP XA; SAY '3031'XA /* produces "3031XA" */

      DROP X4; SAY '3031'X4 /* produces "3031X4" */

      The first two results are the same (as expected), but the third one (with X4) is different.

       
    • I have found, and consulted, a copy of the ANSI Draft Standard for REXX, and concluded that the behaviour described above is a bug.  I have re-posted this matter in the Bugs area, and ask you to follow it there.