|
From: Simon J. <sje...@bl...> - 2004-06-18 20:37:25
|
Vladimir Senkov wrote:
> Hi All,
>
> Having some issues with the parser. Apparently if we tell it that
> something should be a string it still recognizes tokens if they appear
> in that string and produces a syntax error :)
> I'm no flex/bison guru, but i think i'll have to become one soon
> unless someone wants to point out to me what we are doing wrong there
> . . .
IANAG but:
The lex-generated tokeniser tokenises the input stream before the
bison-generated
parser parses the tokenised result. Its too late telling bison that a
string followed by
a character is a string when the tokeniser consumes any sub-strings that
match
protocol keywords without the parser ever seeing the characters they
contained.
You could (but almost certainly shouldn't :) ) try to solve it by
abandoning the
tokeniser and doing all the keyword recognition in the parser, eg
add_keyword : 'A' 'D' 'D' {}; // <---
don't actually try this
create_keyword : 'C' 'R' 'E' 'A' 'T' 'E' {}; // <--- don't
actually try this
etc etc.
I say you shouldn't do this because I don't think it will ever work
properly. The
grammar is already ambiguous where strings are concerned and this can only
make things worse.
Another thing you could - but shouldn't - try would be to detokenise
keywords
in the parser whenever they show up in the wrong place (assuming the parser
could ever work out where the wrong place was):
detok_key : ADD { $$="ADD"; }
| CREATE { $$="CREATE"; }
| etc etc
string : CHAR { std::string s; s = $1; $$ = s; }
| string CHAR { $$ = $1 + $2; }
| string detok_key { $$ = $1 + $2; }; // <------ LOL :)
IMHO strings should be recognised in the tokeniser not the parser. To do
this
the protocol would need to be changed slightly so anything that is an
arbitrary
string (eg filenames, the values of string parameters) is /lexically/
identifiable
as a string. So put them in quotes or something, for example:
SET AUDIO_OUTPUT_CHANNEL PARAMETER 0 0 NAME "monitor left"
rather than
SET AUDIO_OUTPUT_CHANNEL PARAMETER 0 0 NAME monitor left
Now, according to the current grammar the keys in key/value pairs are
strings
as well. (Ambiguous! does the above mean set "NAME" to be "monitor left" or
does it mean set "NAME monitor" to be "left"? Yes, we all know the answer...
but it isn 't contained in the grammar). This could be solved by putting
the keys
in quotes as well but it might be better to restrict allowable key names
so they're
not arbitrary strings any more, eg
// (lex:)
name [A-Za-z_] [A-Za-z0-9_]*
and to forbid any key to match any protocol keyword.
Simon Jenkins
(Bristol, UK)
|