Home modified by Ognyan Tonchev

Ognyan Tonchev — Thu, 09 Jul 2015 14:34:15 -0000

--- v4
+++ v5
@@ -249,7 +249,53 @@
 CheckErrors
 ~~~~~~

-Parsers can be combined by suing the following operators: |, & and <<
+Parsers can be combined using the following operators: |, & and <<
+
+p1 & p2
+and
+p1 << p2
+
+mean almost the same thing but there is still a tiny difference. To illustrate it, lets take as an example variable declaration parsing in C:
+
+~~~~~~
+int a, b, c, d;
+~~~~~~
+
+The grammar may look like:
+
+~~~~~~
+additional_declarator_with_modifier = \
+            OperatorParser(COMMA) & declarator_with_modifier
+
+variable_declaration = \
+        (type_specifier & declarator_with_modifier << \
+            ZeroOrMore(additional_declarator_with_modifier) & \
+            OperatorParser(SEMICOLON))
+~~~~~~
+
+or:
+
+~~~~~~
+additional_declarator_with_modifier = \
+            OperatorParser(COMMA) & declarator_with_modifier
+
+variable_declaration = \
+        (type_specifier & declarator_with_modifier & \
+            ZeroOrMore(additional_declarator_with_modifier) & \
+            OperatorParser(SEMICOLON))
+~~~~~~
+
+And the AST in bothe cases:
+
+~~~~~~
+['int'], [['a'], ['b'], ['c'], ['d']]
+~~~~~~
+
+and
+
+~~~~~~
+['int'], [['a'], [['b'], ['c'], ['d']]]
+~~~~~~

 Iterating the AST:
 ====

Home modified by Ognyan Tonchev

Ognyan Tonchev — Thu, 09 Jul 2015 13:50:00 -0000

--- v3
+++ v4
@@ -127,7 +127,8 @@
       if (p == 5)
         p = 3 + 2;

-    
+    """
+
     # obtain list of tokens present in the source
     lexer = Lexer(TOKENS)
     tokens = lexer.parseTokens(source)

Home modified by Ognyan Tonchev

Ognyan Tonchev — Thu, 09 Jul 2015 13:49:02 -0000

--- v2
+++ v3
@@ -159,17 +159,11 @@

     if len(token) == 3:
        # p = 1
-       # ('p', '=', '1')
+       # ('p', '=', '1') or ('p', '=', ('3', '+', '2'))
        (lo, op, ro) = token
+       if not ro.is_basic_token():
+           ro = update_arthm_expression(ro)
        token = (op, lo, ro)
-    else:
-       # p = 1 + (3 + 2)
-       # ('p', '=', '1', '3', ('+', '2'))
-       # note the Optional parser below
-       (lo1, op1, ro1, sub_token) = token
-       (op2, ro2) = sub_token.get_token()
-       sub_token.set_token((op2, ro1, ro2))
-       token = (op1, ro1, sub_token)

     result.set_token(token)
     return result

Home modified by Ognyan Tonchev

Ognyan Tonchev — Sun, 21 Jun 2015 21:17:22 -0000

--- v1
+++ v2
@@ -1,8 +1,304 @@
-Welcome to your wiki!
-
-This is the default page, edit it as you see fit. To add a new page simply reference it within brackets, e.g.: [SamplePage].
-
-The wiki uses [Markdown](/p/pylangparser/wiki/markdown_syntax/) syntax.
+Welcome to PyLangParser's wiki!
+====
+
+Parse C source code from Python <https: sourceforge.net="" p="" pylangparser="" code="" ci="" master="" tree="" examples=""/>
+
+Parse SQL scripts from Python <https: sourceforge.net="" p="" pylangparser="" code="" ci="" master="" tree="" examples=""/>
+
+Parse GTK-Doc style comments from Python <https: sourceforge.net="" p="" pylangparser="" code="" ci="" master="" tree="" examples=""/>
+(parses function name, arguments, annotations and return value)
+
+Parse food recipes from Python <https: sourceforge.net="" p="" pylangparser="" code="" ci="" master="" tree="" examples=""/>
+
+**pylangparser** - Simple language parsing from Python.
+Project provides classes for parsing formal languages in an easy way.
+Without using any external libraries, only unittest, re and pprint.
+There is a Lexer and a Parser class. The lexer produces list of tokens that the
+Parser then uses to build the AST. The lexer can also be used as a stand alone
+component. There is support for building customized AST's.
+The grammars are defined directly into the Python code.
+
+In the examples folder you will find both simple example scripts demonstrating
+basic usage of the parser and some more useful and complex ones. For example,
+there is a script for parsing C source code and building and iterating the AST.
+SQL parser will be added soon too.
+
+Note: Documentation is not fully complete yet. Existing APIs can still change.
+
+Feel free to send suggestions, comments and patches.
+
+
+Example usage of the Parser:
+====
+
+_The test defines simple calculator language MATABC and demonstrates how programs written
+in that language are parsed._
+
+    #!/usr/bin/python
+    from pylangparser import *
+    
+    # define all tokens in the language
+    IF = Keyword(r'if')
+    
+    KEYWORDS = IF
+    
+    PLUS = Operator(r'+')
+    MINUS = Operator(r'-')
+    ASSIGNMENT = Operator(r'=')
+    SEMICOLON = Operator(r';')
+    EQ = Operator(r'==')
+    LE = Operator(r'<')
+    GT = Operator(r'>')
+    LPAR = Operator(r'(')
+    RPAR = Operator(r')')
+    
+    # order is important as first operator that matches will be considered
+    # so it is important that '<=' is taken before '<'
+    OPERATORS = EQ & PLUS & MINUS & ASSIGNMENT & LE & GT & SEMICOLON & \
+        LPAR & RPAR
+    
+    IGNORE_CHARS = Ignore(r'[ \t\v\f\n]+')
+    
+    COMMENTS = Ignore(r'\#.*\n')
+    
+    IDENTIFIER = Symbols(r'[A-Za-z_]+[A-Za-z0-9_]*')
+    
+    CONSTANT = Symbols(r'[0-9]+')
+    
+    TOKENS = KEYWORDS & OPERATORS & CONSTANT & IDENTIFIER & \
+        COMMENTS & IGNORE_CHARS
+    
+    # we want that certain tokens are ignored in the AST
+    IgnoreTokensInAST(SEMICOLON & LPAR & RPAR)
+    
+    # define our grammar
+    
+    arthm_operator = \
+        OperatorParser(PLUS) | \
+        OperatorParser(MINUS)
+    
+    comp_operator = \
+        OperatorParser(LE) | \
+        OperatorParser(GT) | \
+        OperatorParser(EQ)
+    
+    operand = \
+        SymbolsParser(CONSTANT) | \
+        SymbolsParser(IDENTIFIER)
+    
+    arthm_expression = \
+        SymbolsParser(IDENTIFIER) & \
+        OperatorParser(ASSIGNMENT) & \
+        (operand << Optional(arthm_operator << operand)) & \
+        OperatorParser(SEMICOLON)
+    
+    condition = \
+        operand << \
+        comp_operator << \
+        operand
+    
+    # if_statement and statement have circular dependency, that is why
+    # we have to use RecursiveParser
+    statement = RecursiveParser()
+    
+    if_statement = \
+        KeywordParser(IF) & \
+        OperatorParser(LPAR) & \
+        condition & \
+        OperatorParser(RPAR) & \
+        statement
+    
+    # notice the usage of the '+=' operator below
+    statement += \
+        if_statement | arthm_expression
+    
+    # use AllTokensConsumed so that the parser parses the
+    # complete source
+    program = AllTokensConsumed(ZeroOrMore(statement))
+    
+    # our source code
+    source = """
+    
+    # example program written in ABCMATH
+    
+    p = 12;
+    
+    if (p == 12)
+      if (p == 5)
+        p = 3 + 2;
+    
+    
+    # obtain list of tokens present in the source
+    lexer = Lexer(TOKENS)
+    tokens = lexer.parseTokens(source)
+    print(tokens)
+    
+    # build AST
+    result = program(tokens, 0)
+    result.pretty_print()
+
+
+When the program is run, it will output the following tree:
+
+
+~~~~~~
+[[['p'], ['='], ['12']],
+ [['if'],
+  [['p'], ['=='], ['12']],
+  [['if'], [['p'], ['=='], ['5']], [['p'], ['='], [['3'], ['+'], ['2']]]]]]
+~~~~~~
+
+But maybe the tree can be reorganized a bit so that it is easier to interpret it.
+Let's modify our code a bit.
+
+First we modify the ***arthm_expression*** parser:
+
+~~~~~~
+def update_arthm_expression(result):
+    token = result.get_token()
+
+    if len(token) == 3:
+       # p = 1
+       # ('p', '=', '1')
+       (lo, op, ro) = token
+       token = (op, lo, ro)
+    else:
+       # p = 1 + (3 + 2)
+       # ('p', '=', '1', '3', ('+', '2'))
+       # note the Optional parser below
+       (lo1, op1, ro1, sub_token) = token
+       (op2, ro2) = sub_token.get_token()
+       sub_token.set_token((op2, ro1, ro2))
+       token = (op1, ro1, sub_token)
+
+    result.set_token(token)
+    return result
+
+arthm_expression = \
+    CustomizeResult (SymbolsParser(IDENTIFIER) & \
+    OperatorParser(ASSIGNMENT) & \
+    operand & \
+    Optional(arthm_operator & operand) & \
+    OperatorParser(SEMICOLON), update_arthm_expression)
+~~~~~~
+
+And then the ***if_statement*** parser:
+
+~~~~~~
+def update_condition(result):
+    # p == 1
+    # ('p', '==', '1')
+    token = result.get_token()
+    (lo, op, ro) = token
+    result.set_token((op, lo, ro))
+    return result
+
+if_statement = \
+    KeywordParser(IF) & \
+    OperatorParser(LPAR) & \
+    CustomizeResult (condition, update_condition) & \
+    OperatorParser(RPAR) & \
+    statement
+~~~~~~
+
+The result tree will look a bit different now:
+
+~~~~~~
+[[['='], ['p'], ['12']],
+ [['if'],
+  [['=='], ['p'], ['12']],
+  [['if'], [['=='], ['p'], ['5']], [['='], ['p'], [['+'], ['3'], ['2']]]]]]
+~~~~~~
+
+Always use CheckErrors or AllTokensConsumed as a top level parser in order
+to get relevant information about parse errors:
+
+~~~~~~
+Traceback (most recent call last):
+  File "simple_calc_language.py", line 103, in <module>
+    result = program(tokens, 0)
+  File "../pylangparser.py", line 915, in __call__
+    "Unknown symbol: %s" % tokens[i].get_token())
+pylangparser.ParseException: row: 7, column: 7,
+    message: Unknown symbol: (
+~~~~~~
+
+List of supported Tokens:
+
+~~~~~~
+Keyword
+Symbols
+Operator
+Ignore
+~~~~~~
+
+If case-insensitive matching is desired when parsing Tokens, the ignorecase constructor property should be set when creating Token instances:
+
+~~~~~~
+IF = Keyword(r'if', ignorecase=True)
+~~~~~~
+
+List of supported Parsers:
+
+~~~~~~
+KeywordParser
+OperatorParser
+SymbolsParser
+Optional
+ZeroOrMore
+Repeat
+AllTokensConsumed
+RecursiveParser
+IgnoreResult
+CustomizeResult
+CheckErrors
+~~~~~~
+
+Parsers can be combined by suing the following operators: |, & and <<
+
+Iterating the AST:
+====
+
+The result of applying a parser combination to some input is a ParserResult.
+A ParserResult may contain simple token, another ParserResult or a tuple of ParserResult's.
+A ParserResult can be iterated using the get_sub_group(index) function, indexes or iterators. Indexes start from 1. 0 means the whole tree.
+
+~~~~~~
+result = parser(tokens, 0)
+
+sub_group = result.get_sub_group(1)
+sub_group.pretty_print()
+
+Or
+
+sub_group = result[1]
+sub_group.pretty_print()
+
+Or
+
+for sub_group in result:
+    sub_group.pretty_print()
+~~~~~~
+
+To check if a given group/sub-group is a result of applying a particular parser use the check_parser(parser) and check_parser_instance(parser_class) functions:
+
+~~~~~~
+result = program(tokens, 0)
+sub_group = result.get_sub_group(1)
+if sub_group.check_parser(if_statement)
+    print("this is an if-statement")
+~~~~~~
+
+For more detailed info check the ***source code*** and the ***c_parser.py*** example.
+
+Each group/sub-group can be pretty-printed with the pretty_print() function:
+
+~~~~~~
+result.pretty_print()
+sub_group.pretty_print()
+~~~~~~
+
+You can download and try the Examples:<https: sourceforge.net="" p="" pylangparser="" code="" ci="" master="" tree="" examples=""/>

 [[members limit=20]]
 [[download_button]]

Home modified by Ognyan Tonchev

Ognyan Tonchev — Sun, 21 Jun 2015 20:51:17 -0000

Welcome to your wiki!

This is the default page, edit it as you see fit. To add a new page simply reference it within brackets, e.g.: [SamplePage].

The wiki uses Markdown syntax.

Project Members:

Ognyan Tonchev (admin)

Recent changes to Home

Home modified by Ognyan Tonchev

Home modified by Ognyan Tonchev

Home modified by Ognyan Tonchev

Home modified by Ognyan Tonchev

Home modified by Ognyan Tonchev

Project Members: