[brlcad-commits] SF.net SVN: brlcad:[47005] brlcad/trunk/doc
Open Source Solid Modeling CAD
Brought to you by:
brlcad
From: <n_...@us...> - 2011-09-30 23:16:19
|
Revision: 47005 http://brlcad.svn.sourceforge.net/brlcad/?rev=47005&view=rev Author: n_reed Date: 2011-09-30 23:16:12 +0000 (Fri, 30 Sep 2011) Log Message: ----------- added notes on converting bison/flex to re2c/lemon Modified Paths: -------------- brlcad/trunk/doc/CMakeLists.txt brlcad/trunk/doc/Makefile.am Added Paths: ----------- brlcad/trunk/doc/bison_to_lemon.txt brlcad/trunk/doc/flex_to_re2c.txt Modified: brlcad/trunk/doc/CMakeLists.txt =================================================================== --- brlcad/trunk/doc/CMakeLists.txt 2011-09-30 21:05:25 UTC (rev 47004) +++ brlcad/trunk/doc/CMakeLists.txt 2011-09-30 23:16:12 UTC (rev 47005) @@ -23,11 +23,13 @@ anim.txt archer_ack.txt benchmark.tr + bison_to_lemon.txt brep.txt BRL-CAD.bib cvs.txt description.txt deprecation.txt + flex_to_re2c.txt ged.tr history.txt hypot.txt Modified: brlcad/trunk/doc/Makefile.am =================================================================== --- brlcad/trunk/doc/Makefile.am 2011-09-30 21:05:25 UTC (rev 47004) +++ brlcad/trunk/doc/Makefile.am 2011-09-30 23:16:12 UTC (rev 47005) @@ -32,11 +32,13 @@ anim.txt \ archer_ack.txt \ benchmark.tr \ + bison_to_lemon.txt \ brep.txt \ BRL-CAD.bib\ cvs.txt \ description.txt \ deprecation.txt \ + flex_to_re2c.txt \ ged.tr \ history.txt \ hypot.txt \ Added: brlcad/trunk/doc/bison_to_lemon.txt =================================================================== --- brlcad/trunk/doc/bison_to_lemon.txt (rev 0) +++ brlcad/trunk/doc/bison_to_lemon.txt 2011-09-30 23:16:12 UTC (rev 47005) @@ -0,0 +1,216 @@ +Notes on converting a Bison app to a Lemon app. + +========= + Running +========= +Bison is typically run like this: +$ bison -d foo.y + +This outputs foo.tab.c and foo.tab.h + +Lemon is typically run like this: +$ lemon -q foo.y + +This outputs foo.c and foo.h. The '-q' suppresses output of the report file. + +Lemon's foo.h is not equivalent to Bison's foo.tab.h; it only contains token definitions. Instead of using foo.h, you'll probably use a wrapper header that includes foo.h, plus the extra stuff you need (like stuff you would find in foo.tab.h). + +=============== + Trivial Stuff +=============== +- Lemon doesn't have an equivalent to "%%" because it doesn't have sections. +- Lemon requires 'assert' to be defined (e.g. by including assert.h). +- %include { ... } instead of %{ ... %} for text to be pasted at the top of + the output. +- Lemon calls the code in %syntax_error instead of yyerror() on a syntax error. +- Lemon has different syntax for changing precedence: + /* bison */ + expr : MINUS expr %prec NOT { ... }; + + /* lemon */ + expr ::= MINUS expr. [NOT] { ... } + +- Lemon has different syntax for specifying the type of non-terminals: + /* bison */ + %type <case_item> case_action case_otherwise + + /* lemon */ + %type case_action { case_item } + %type case_otherwise { case_item } + + Lemon has Slightly Stricter Rules on Grammar Definition +--------------------------------------------------------- +The most obvious restriction is that Lemon doesn't allow the start symbol to be recursive, so when converting from bison grammar, you'll probably have to make the start symbol the production of a new (non-recursive) start symbol. + + /* bison */ + statement_list + : '\n' + | statement_list statement '\n' + ; + + /* lemon */ + start_symbol :: statement_list. + + statement_list ::= EOL. + statement_list ::= statement EOL. + +=================== + Non-Trivial Stuff +=================== + + Lemon Handles Errors a Bit Differently +---------------------------------------- +Lemon will not attempt error recovery unless you define a non-terminal symbol named "error" in your grammar. + +Assuming this symbol is defined, Lemon calls any code defined with %syntax_error whereas Bison would call yyerror(). + +In Bison you can manually signal a context-sensitive syntax error with the YYERROR macro, and Bison will behave as if it had detected a syntax error! + +There is no way to make Lemon think it detected a syntax error without a +significant hack. The only thing you can do is record the error and wait until +Lemon finishes parsing or faults on an error that it /can/ detect. + +Lemon calls any code defined with %parse_failed after error recovery fails and the parser resets to the start state. + +When parsing fails, Bison returns an error number less than zero. The Lemon +Parse() function returns void, so if you want to tell the caller about an error, +you have to store an error code somewhere in the optional fourth argument to +Lemon's Parse() function. + + Lemon and Bison Assign IDs to Tokens Differently +-------------------------------------------------- +Bison allows you to use character-literals as tokens, like '\n': + + statement_list + : '\n' + | statement_list statement '\n' + ; + +Bison enumerates non-character-literal tokens starting at 258 so that the ID of literal tokens like '\n' can simply be their ASCII value. + +Lemon enumerates tokens starting at 1, and doesn't allow the use of character literals as tokens. The lemon equivalent of the above would be: + + statement_list ::= EOL. + statement_list ::= statement_list statement EOL. + +The lexer would return EOL to Lemon rather than '\n'. + + The Lemon Parser Doesn't Reduce Immediately +--------------------------------------------- +When the lemon parser receives a new token, it reduces the tokens currently on the stack as much as possible, and then pushes the new token. Bison pushes the new token, then reduces. + +"The Bison paradigm is to parse tokens first, then group them into larger syntactic units" (Bison manual). + +You may have to perform certain actions sooner (i.e. in a lower-level rule) in lemon than you would with Bison in order to get the correct behavior. + +The lemon ParseTrace function is useful if you want to examine lemon's behavior. + + Lemon Uses Symbolic Names to Reference Grammar Symbols +-------------------------------------------------------- + /* bison */ + expression : expression '+' expression { $$ = $1 + $3; } + + /* lemon */ + expression(A) ::= expression(B) PLUS expression(C). { A = B + C; } + +WARNING: Converting from bison to lemon syntax is tedious and error-prone. + +Remember that lemon doesn't provide a default action. Bison's default is +{ $$ = $1; }. + + Lemon Makes All Terminals the Same Type +----------------------------------------- +In Bison, you typically create a union that specifies the possible types of tokens, and then you declare the types for individual tokens: + + /* Bison uses this to generate a C union called YYSTYPE */ + %union { + float real; + int integer; + bool toggle; + } + + %token <real> coord + ... + + /* parser already knows types for $2, $3 and $4 because the type of all + * coord tokens was declared as <real> (float). + */ + vertex : VERTEX coord coord coord + { + vert_t v = {$2, $3, $4}; + ... + } + ... + +Lemon considers all tokens to be of the same type which is typically a struct/union or a pointer to a struct/union. You specify the types of tokens explicitly within actions. WARNING: The default token type appears to be (void*), but the Lemon documentation claims that it's (int)! + + /* included from custom header file - this is basically what Bison would + * generate from its %union directive + */ + typedef union YYSTYPE { + float real; + int integer; + bool toggle; + } YYSTYPE; + + #define COORD(c) c.real + + /* actual grammar file */ + ... + %token_type {YYSTYPE} + ... + + /* We have to explicitly identify the type of each token referenced in + * each action. + */ + vertex : VERTEX coord(A) coord(B) coord(C) + { + vert_t v = {COORD(A), COORD(B), COORD(C)}; + ... + } + +Lemon Doesn't Support Mid-Rule Actions +--------------------------------------- +Bison allows you to put actions anywhere in the rhs component list, where they +become unnamed components. Lemon does not support this feature. + + /* bison */ + alias_statement + : TOK_ALIAS TOK_IDENTIFIER TOK_FOR general_ref SEMICOLON + { + /* this action is unnamed component $6 */ + struct Scope_ *s = SCOPEcreate_tiny(OBJ_ALIAS); + PUSH_SCOPE(s,(Symbol*)0, OBJ_ALIAS); + } + statement_rep TOK_END_ALIAS SEMICOLON + { + Expression e = EXPcreate_from_symbol(Type_Attribute, $2); + Variable v = VARcreate(e, Type_Unknown); + v->initializer = $4; + DICTdefine(CURRENT_SCOPE->symbol_table, $2->name, (Generic)v, $2, + OBJ_VARIABLE); + $$ = ALIAScreate(CURRENT_SCOPE, v, $7); + POP_SCOPE(); + } + ; + + /* lemon */ + alias_statement(A) ::= TOK_ALIAS TOK_IDENTIFIER(B) TOK_FOR general_ref(C) + SEMICOLON alias_push_scope statement_rep(D) + TOK_END_ALIAS SEMICOLON. + { + Expression e = EXPcreate_from_symbol(Type_Attribute, B); + Variable v = VARcreate(e, Type_Unknown); + v->initializer = C; + DICTdefine(CURRENT_SCOPE->symbol_table, B->name, (Generic)v, B, + OBJ_VARIABLE); + A = ALIAScreate(CURRENT_SCOPE, v, D); + POP_SCOPE(); + } + + alias_push_scope ::= /* subroutine */. + { + /* this action is unnamed component $6 */ + struct Scope_ *s = SCOPEcreate_tiny(OBJ_ALIAS); + PUSH_SCOPE(s,(Symbol*)0, OBJ_ALIAS); + } Property changes on: brlcad/trunk/doc/bison_to_lemon.txt ___________________________________________________________________ Added: svn:mime-type + text/plain Added: svn:eol-style + native Added: brlcad/trunk/doc/flex_to_re2c.txt =================================================================== --- brlcad/trunk/doc/flex_to_re2c.txt (rev 0) +++ brlcad/trunk/doc/flex_to_re2c.txt 2011-09-30 23:16:12 UTC (rev 47005) @@ -0,0 +1,160 @@ +Notes on converting a Flex app to an re2c app. + +========= + Running +========= +Flex is typically run like this: +$ flex -o foo.c foo.l + +re2c is typically run like this: +$ re2c -Fc -o foo.c foo.y + +The '-Fc' options add support for some flex features. + +=============== + Trivial Stuff +=============== + regular expressions +--------------------- +re2c regular expressions aren't quite as nice as in flex. You have to quote standalone literals ('\n' not \n, but [\n\t ] is okay), and there aren't any character classes like [[:digit:]] available. + + re2c Doesn't Generate a Scanner Function +------------------------------------------ +re2c doesn't produce an equivalent to yylex(). You have to write the scanner +function yourself, and embed the re2c syntax inside it. + +=================== + Non-Trivial Stuff +=================== + + re2c Doesn't Have an Equivlant to 'yytext' +-------------------------------------------- +One way to simulate yytext is to define it as a macro that calls a function which returns the token text: + + #define yytext ((char*)get_yytext(...)) + + Handling Unrecognized Characters by Echoing or Ignoring +--------------------------------------------------------- +Some apps may avoid backtracking by using an echo action (return yytext[0]) +for unrecognized characters, or by simply ignoring unrecognized characters. + +If you want to ignore a token, you have to explicitly restart scanning in the token's action block. If you leave the action block empty, execution continues to the next pattern's action, which isn't generally what you want. + +One thing you can do is put a label before the re2c stuff and goto it whenever a token is ignored: + + #define IGNORE_TOKEN continue; /* implicit goto */ + + + scan(char *cursor) { + ... + + while (1) { /* implicit label */ + + /*!re2c + re2c:define:YYCURSOR = cursor; + ... + + "//".* { IGNORE_TOKEN } + + } + } + +You could also call the scanning function recursively, though there's no +obvious reason to prefer this approach: + + scan(char *cursor) { + ... + /*!re2c + re2c:define:YYCURSOR = cursor; + ... + "//".* { return scan(cursor); /* ignore this token, return next one */ } + ... + } + + re2c Condition Support +------------------------ +The '-c' flag to re2c gives some support for start conditions, but you have to do quite a bit more work than in flex. + +- Exclusive Versus Inclusive - +In flex, %s is used to specify /inclusive/ start symbols. Inclusive start +symbols implicitly include unmarked rules. %x is used to specify +/exclusive/ start symbols, which include only marked rules, thus: + + /* given these symbols... */ + %s S1 S2 + %x X1 + + /* ...this... */ + <S1>/rule1/ + <X1>/rule2/ + /rule3/ + + /* ...is equivalent to this */ + + /* marked rules are as declared */ + <S1>/rule1/ + <X1>/rule2/ + + /* unmarked rules have initial AND inclusive states */ + <INITIAL,S1,S2>/rule3/ + +re2c does not allow unmakred rules if start conditions are used! Each rule has +only the marked conditions and no others. Thus, if your flex app uses +inclusive (%s) symbols, you will need to explicitly mark any unmarked rules +with the inclusive symbols. + +Here's a flex example that uses start conditions: + + /* declare COMMENT condition (INITIAL condition defined automatically) */ + %x COMMENT + + %% + + "/*" { BEGIN(COMMENT); } + + /* rules that apply in comment mode */ + <COMMENT> { + . + \n + "*/" { BEGIN(INITIAL); } + } + +The equivalent in re2c: + + /* Start conditions are declared together in enum. + * + * re2c prefixes these names with "yyc" by default; change this by setting + * re2c:condenumprefix = ""; + */ + enum YYCONDTYPE { + INITIAL, + COMMENT + }; + + /* re2c uses YYSETCONDITION instead of BEGIN; you can change this by setting + * re2c:define:YYSETCONDITION = BEGIN; + */ + + /* Define BEGIN(enum YYCONDTYPE) to set current state (which should be + * initialized to INITIAL). + * This might be a function that sets a global, or a macro that sets a + * local parameter, etc. + */ + + /* Define YYGETCONDITION() to get current state. + * This will be a function or macro that returns the state as set by + * BEGIN. + */ + + /* If '-c' is used, ALL rules must be proceeded by a condition. + * Ordinary rules need to be prefixed with the INITIAL condition. + */ + <INITIAL>"/*" { BEGIN(COMMENT); } + + /* This also means that you can't create a condition block. Each individual + * rule that applies in a specific condition must be proceeded with the + * condition. + */ + <COMMENT>. + <COMMENT>\n + <COMMENT>"*/" { BEGIN(INITIAL); } Property changes on: brlcad/trunk/doc/flex_to_re2c.txt ___________________________________________________________________ Added: svn:mime-type + text/plain Added: svn:eol-style + native This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |