From: <bl...@us...> - 2003-08-25 21:59:54
|
Update of /cvsroot/cpptool/rfta/src/pyrfta/test/rfta/parser In directory sc8-pr-cvs1:/tmp/cvs-serv12893/src/pyrfta/test/rfta/parser Modified Files: nodetesthelper.py parsertesthelper.py Added Files: .cvsignore readme.txt Log Message: * added parser overview documentation --- NEW FILE: .cvsignore --- *.pyc --- NEW FILE: readme.txt --- 1) Overview The parser implements the full C++ grammar, but does not support template yet. The grammar is based on the one found in 'The C++ Programming Language, 3rd Edition', by Bjarne Stroustrup, which is a condensed version of the grammar defined in the C++ 98 standard: 'ISO/IEC 14882, Programming Languages - C++'. The parser framework is what I believe is called a LL(n) parser (recursive descendent parser with infinite look ahead). The rule of the grammar have the same name where the grammar was left unmodified. This makes it easy to see what are the constraint surrounding a rule. Major to the grammar includes: - removal of infinite left recursion - adaptation to a 'I don't known the nature of this identifier' environment (this mainly affected declarator/declaration parsing) Here is a sketch about the whole parsing process: - First the Tokenizer transform the text to parser into a sequence of tokens. A token as a type (integer, string, symbol...), its position in the text, as well as the text it represents. - the token sequence is passed to a TokenProvider. The token provider also takes a StructuredTokenStream and an optional ParsingTracker at construction. The TokenProvider is responsible for providing the 'next token' to use for matching to the parsers. It also implements a transactional semantic: the position can be rollbacked to a previously memorized position (this is how alternative rule are implemented). The StructuredTokenStream stores the output of the parser (what will become an AST later). The ParsingTracker is used for debugging. It is called when each parser is entered and exited (with the matching result). - Finally, the top-level parser is called with the TokenProvider as a parameter. All parser derives from Parser, which basically as I main method: parse, which takes the TokenProvider as paramter, and returns a boolean indicating if the parser matched the tokens. Here is what the symbol parser (a symbol that match a keyword or an operator) do: - indicates to the TokenProvider that the parser is entered (which forward the information to the ParsingTracker) - ask the next token to the TokenProvider - check if the token is a symbol, and if the token text matched the expected symbol (passed at the construction of the parser) - if there is a match, the token is pushed to the StructuredTokenStream which is obtained from the TokenProvider - exit and signal the TokenProvider indicating if there was a match or not. All parsers follow a similar logic. There is a few key parser, such as the SequenceParser, which just call parse() on a sequence of parser that was passed at the construction, and AlternativeParser, which does a similar thing, but opens a 'transaction' on the TokenProvider before calling parse(), rollback if there was no match, and exit immediately with success if there was a match. There is also a special kind of parser, which always returns a match, and push a 'command' into the StructuredTokenStream. We'll see how this is used later. When the top-level parser exit, it indicates if it successfully matched the token sequence. The result of the parsing is stored in the StructuredTokenStream. - The StructuredTokenStream is processed and converted into an AST tree. The StructuredTokenStream is a sequence of matched tokens intermixed with command objects. The command objects indicates how the AST should be created. It is a simple postfix language. When processing the structured token stream after the parsing to create the AST, we starts with a node which is the 'current node'. If a Token is found in the sequence, then it is added to current node. If a command is found, then it is executed. A command can change the 'current node' and do just about anything. Some common commands are 'memorize currrent node', 'restore current node from memory', 'create new node', 'rename current node', 'remove previous node'. There is also some more complex commands like 'apply binary operator node transformation' which make some interesting transformation in the resulting AST... Its simple and allow for complex AST transformations. When writting with the grammar, the user is shielded from those low level commands by helper factory functions. 2) Playing with the parser Unit tests are the best way to understand things. The most interesting things to begin with are probably parsertest (parser framework UT)and grammartest (grammar UT). You can activate the 'debug' mode of the parser to see what are the matched rules and what is the produced AST. This change a little depending on the method used to test the parser. There is two generic cases: a) the checkParser() methods: self.checkParser( "int x;", grammar.translation_unit_pi ) If not defined, you need to specify the expect number of token not used by the parser (defaulted to 0) before the debug mode: self.checkParser( "int x;", grammar.translation_unit_pi, 0, RESULT_DUMP ) b) the checkProducedNodes/Tree() methods: self.checkProducedNodes( 'operator =', grammar.unqualified_id_pi, TestNode( 'operator_function_ref', 'operator', '=' ) ) You just need to specify the debug mode: self.checkProducedNodes( 'operator =', grammar.unqualified_id_pi, TestNode( 'operator_function_ref', 'operator', '=' ), MATCHED_TRACE ) At the current time, there is three levels of debug mode (defined in parsertesthelper.py): MATCHED_TRACE: dump the matched rules 'path' to the output, and the produced AST tree. TRACE: dump each rule execution to the output, and the produced AST tree. RESULT_DUMP: only dump the produced AST tree 3) Running the parser This parser was developped on a old Cyrix 200 (yes that's still exist) running Python 2.3. It as also been tested with Python 2.2. CD to the top-level package of the parser (src/pyrfta/test), above the rfta/ and mock/ packages. Run the alltests.py script ('python alltests.py'). Baptiste Lepilleur, 08/2003. Index: nodetesthelper.py =================================================================== RCS file: /cvsroot/cpptool/rfta/src/pyrfta/test/rfta/parser/nodetesthelper.py,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** nodetesthelper.py 24 Aug 2003 21:00:36 -0000 1.1 --- nodetesthelper.py 25 Aug 2003 21:48:05 -0000 1.2 *************** *** 36,39 **** --- 36,47 ---- class TestNode(node.Node): def __init__( self, name, *children ): + """'name' is the type of the expected node. + 'children' is a sequence formed from the remaining parameters passed at construction. + Each parameter represents a children. The parameter type is tested to allow compact + tree specification. + None: do nothing + 'text': (string) creates and add a TestTokenNode of type SYMBOL with the specified text. + (token_type,token_text): (tuple) creates a TestTokenNode + TestNode: composite child to add.""" actual_children = [] for child in children: *************** *** 106,109 **** --- 114,128 ---- def assertTreeEqual( self, left, right ): + """Checks that the 'left' and 'right' node are similar. The left node must only contains + instance of TestNode, JokerTestNode and TestTokenNode. + The comparison is done by traversing the left and right tree in parallel. + A TestNode checks that the left and right node have the same type, the same child count + and that their children are similar. + A JokerTestNode only checks that the left and right node have the same type. Children are + not tested. + A TestTokenNode checks that the left and right token node have the same type and text. + If a difference is found, the left and right tree are dumped, as well as a diagnostic + string pinpointing where the difference is. + """ self.assert_( left == right, '=> Expected tree:\n%s\n=> Actual tree:\n%s\nDifferences:\n%s\n' % Index: parsertesthelper.py =================================================================== RCS file: /cvsroot/cpptool/rfta/src/pyrfta/test/rfta/parser/parsertesthelper.py,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** parsertesthelper.py 24 Aug 2003 21:00:36 -0000 1.1 --- parsertesthelper.py 25 Aug 2003 21:48:05 -0000 1.2 *************** *** 31,34 **** --- 31,40 ---- def checkParser( self, text, tested_parser, expected_remaining_tokens = 0, debug_level = NO_TRACE ): + """Checks that 'tested_parser' matches the specified 'text'. + 'expected_remaining_tokens' indicates the number of tokens that are expected not to + have been consumed by the parser (useful to test that a parser stop at a given token. + If the parsing is succesful, the produced AST is explored to ensure that no node + containing 'error' in the type/name exist (used for parser error recovery). + 'debug_level' indicates the level of debug output.""" match, tracker, scanner = self.doParsing( text, tested_parser, debug_level ) self.assert_( match, "parser should have matched source" ) *************** *** 49,52 **** --- 55,60 ---- def checkParserFail( self, text, tested_parser, debug_level = NO_TRACE ): + """Checks that 'tested_parser' does not match the specified 'text'. + 'debug_level' indicates the level of debug output.""" match, tracker, scanner = self.doParsing( text, tested_parser, debug_level ) self.assert_( not match, "parse should not have matched source" ) *************** *** 55,62 **** --- 63,77 ---- def checkProducedNodes( self, text, tested_parser, expected_tree, debug_level = NO_TRACE, _allow_error_node = False ): + """Checks that the 'tested_parser' matches the specified 'text', and that the produced AST matches + the 'expected_tree'. See NodeTestHelperMixIn.assertTreeEqual() for detail about the comparison. + A TestNode is automatically added at the top of the 'expected_tree' before the comparison. + If the parsing is succesful, the produced AST is explored to ensure that no node + containing 'error' in the type/name exist (used for parser error recovery). + 'debug_level' indicates the level of debug output.""" expected_tree = TestNode( '', expected_tree ) self.checkProducedTree( text, tested_parser, expected_tree, debug_level, _allow_error_node ) def checkProducedTree( self, text, tested_parser, expected_tree, debug_level = NO_TRACE, _allow_error_node = False ): + """Same as checkProducedNodes(), but does not add a TestNode at the top of the 'expected_tree'.""" if type(expected_tree) == type( () ): expected_tree = TestNode( '', expected_tree ) *************** *** 74,80 **** --- 89,97 ---- def checkProducedErrorNodes( self, text, tested_parser, expected_tree, debug_level = NO_TRACE ): + """Same as checkProducedNodes(), but allow the presence of error node in the produced AST.""" self.checkProducedNodes( text, tested_parser, expected_tree, debug_level, True ) def checkProducedErrorTree( self, text, tested_parser, expected_tree, debug_level = NO_TRACE ): + """Same as checkProducedTree(), but allow the presence of error node in the produced AST.""" self.checkProducedTree( text, tested_parser, expected_tree, debug_level, True ) |