Re: [Nasm-devel] [nasm:master] BR30730640: Restore preprocessor token concatenation rules

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>> > The former candidates for concatenation were (in terms of RE)
>> >
>> > expand_smacro
>> >    [(TOK_ID|TOK_PREPROC_ID)][(TOK_ID|TOK_PREPROC_ID|TOK_NUMBER)]
>> >
>> > expand_mmac_params
>> >    [(TOK_ID|TOK_NUMBER|TOK_FLOAT)][(TOK_ID|TOK_NUMBER|TOK_FLOAT|TOK_OTHER)]
>>
>> The above permit concatenations that result in invalid
>> tokens, and therefore need to be tightened. Try these
>> rules instead:
>>
>> - whitespace(s)/tab(s) with whitespace(s)/tab(s)
>> - identifiers with either identifiers or numbers or $/$$
>> - preprocessor identifiers or numbers with numbers
>> - $ with identifiers or numbers that don't begin with $
>>
>> I don't know enough about TOK_FLOAT to comment.
>
> OK, thanks for comments, I'll take a look tomorrow. But basically I would
> rather like to hear which exactly invalid tokens you mean? Mind
> to elaborate if you have some spare minutes? Also I fear this part of

In your first set of rules you cannot have TOK_PREPROC_ID on
the right side, since it may start with a character that cannot be
inside a TOK_ID or TOK_NUMBER.

In your second set of rules you cannot have TOK_NUMBER plus
TOK_ID, let alone xxx plus any TOK_OTHER, for the same basic
reason as above.

> code is pretty sensible to "backward compatibility" so there is an
> easy way to broke some old sources people still compile with nasm.

NASM 0.98 only supported implicit concatenation (mmac), with
three simple rules:

- TOK_WHITE + TOK_WHITE
- TOK_ID + {TOK_ID,TOK_NUMBER}
- TOK_NUMBER + TOK_NUMBER

It did not handle TOK_PREPROC_ID + TOK_NUMBER, simply
because it was written only with %n + TOK_NUMBER in mind,
i.e. the case where %n becomes TOK_NUMBER first and then
gets handled by the TOK_NUMBER + TOK_NUMBER rule. Nor
did it handle $ and $$. (These are the only two extensions that
I found necessary over the years.)

> Technically speaking I've restored old behaviour which was changed
> at moment of code unification pretty long ago. So if there some
> cases broken now it would be great to see it. Thanks.

I think I now understand why TOK_FLOAT was introduced -- so
that TOKEN_FLOAT can be handed to evaluate(). (I went down
a different path for this functionality -- instead of eval ops I did it
as special-cased pp smacs -- so I didn't need TOK_FLOAT.)

Thanks to this TOK_FLOAT complexity hpa's "use tokenize()"
seems like your best bet. (And, as a side effect, you'd get the
desired symmetry between implicit (mmac) and explicit (%+)
concatenation.)