Menu

Empty dicts?

2006-02-26
2013-05-14
  • Mike Stallard

    Mike Stallard - 2006-02-26

    Hi there,

    First off, I'd like to say thank you very much for writing such a fantastic tool, which has made my life a heck of a lot easier :)

    I've stumbled across something that seems like a bit of an oddity. The lists I generate from my code seem to work perfectly as I'd expect, aside from when I pprint them or compare them to themselves - they seemingly contain empty dict's that I cannot for the life of me figure out where they come from.

    For example

    Text that appears to parse as:
    [['Sin'], ['Cos'], ['Tan'], ['Log']]

    pprints as:
    ([(['Sin'], {}), (['Cos'], {}), (['Tan'], {}), (['Log'], {})], {})

    And
    [['X']]
    pprints as:
    ([(['X'], {})], {})

    Normally I wouldn't be bothered by it, but I'm trying to implement some unit testing on the code, and it kinda falls over with this.

    I'll dump the code that I've written here...

    ==================

    from pyparsing import Combine,ParseException,Forward, Group, Word, Literal, alphas, OneOrMore, Optional, alphanums, empty
    import pprint

    pp = pprint.PrettyPrinter(indent=4)

    class grammar:
            termDict = dict()
            currentAssign = ""
            startSymbol = ""
            exprExpn = [['<expr>', '<op>', '<expr>'], ['(', '<expr>', '<op>', '<expr>', ')'], ['<pre-op>', '(', '<expr>', ')'], ['<var>']]

            def __handleProductionRuleLeft(self, s, loc, toks):
                   self.currentAssign = toks[0]

            def __handleStartSymbol(self, s, loc, toks):
                    self.startSymbol = toks[0]
                    self.currentAssign = "S"
            def __handleProd(self, s, loc, toks):
                    pass

            def __bnf_bnf(self):
                    # we include strange symbols in terminals for brevity
                    self.terminal = Word(alphas,alphanums+"-") ^ Literal("(") ^ Literal(")") ^ Literal("+") ^ Literal("-") ^ Literal("*") ^ Literal("/")
                    self.nonterminal = Combine(Literal("<") + self.terminal + Literal(">"))

                    # our BNF-like grammar sets up the start symbol with a rule as follows...
                    # S ::= <starting-non-terminal>
                    self.startSymbolRule = Literal("S").suppress() + Literal("::=").suppress() + self.nonterminal
                    self.startSymbolRule.setParseAction(self.__handleStartSymbol)

                    self.productionRuleLeft = self.nonterminal + Literal("::=").suppress()
                    self.productionRuleLeft.setParseAction(self.__handleProductionRuleLeft)

                    self.productionRule =  self.productionRuleLeft.suppress() + (\                                                                 OneOrMore(\                                                                         Group(OneOrMore(self.terminal ^ self.nonterminal)\                                                                                 + Optional(Literal("|").suppress())\                                                                         )\                                                                 )\                                                         ) + empty

                    self.productionRule.setParseAction(self.__handleProd)

                    self.grammarRule = self.productionRule ^ self.startSymbolRule

            def getExpansions(self,nonTerminal):
                    return self.termDict[nonTerminal]

            def __init__(self,gramFile):
                    self.__bnf_bnf()

                    self.gramFile = open(gramFile,"r")

                    for line in self.gramFile:
                            try:
                                    currentTerms = self.grammarRule.parseString(line)
                                    if self.currentAssign != "S":
                                            self.termDict[self.currentAssign] = currentTerms
                                            print(currentTerms)
                                            pp.pprint(currentTerms)

                            except ParseException, err:
                                    print err.line
                                    print err

    asdf = grammar("gramm.bnf")

    ============= EOF ===============

    And here is the file it is parsing:

    S ::= <expr>
    <expr> ::= <expr> <op> <expr> | ( <expr> <op> <expr> ) | <pre-op> ( <expr> ) | <var>
    <op> ::= + | - | / | *
    <pre-op> ::= Sin | Cos | Tan | Log
    <var> ::= X

    ============= EOF =============

    Could someone please explain to me what I'm doing wrong and how I could fix it (if possible!)

    Many thanks,

    Mike

     
    • Mike Stallard

      Mike Stallard - 2006-02-26

      Here's some extra output from doing a

      self.grammarRule.setDebug()

      ==============================

      Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
      Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> ['<expr>']
      Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
      Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['<expr>', '<op>', '<expr>'], ['(', '<expr>', '<op>', '<expr>', ')'], ['<pre-op>', '(', '<expr>', ')'], ['<var>']]
      Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
      Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['+'], ['-'], ['/'], ['*']]
      Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
      Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['Sin'], ['Cos'], ['Tan'], ['Log']]
      Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
      Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['X']]

      ======================

      From looking further through this forum, I noticed that the problem I'm having looks kinda similar to the problems Jackey Sieka is/was having with asXML (http://sourceforge.net/forum/forum.php?thread_id=1427882&forum_id=337293) - could they be somehow related?

       
      • Mike Stallard

        Mike Stallard - 2006-02-26

        Argh, "Jacek" not "Jackey"...apologies, it's been a long day ;)

         
    • Paul McGuire

      Paul McGuire - 2006-02-27

      Mike -

      One basic concept is that pyparsing returns not lists or dictionaries, but ParseResults.  ParseResults are complex data structures that permit list, dict, and object -like access.  It looks like a list since the __str__ method outputs a list-like view, but it is much more than a list.

      For pprint to show properly, use the ParseResults asList() method.  That is, instead of;

         results = bnf.parseString(data)
         pprint.pprint(results)

      use:

         results = bnf.parseString(data)
         pprint.pprint(results.asList())

      Here are some other comments:
      1. You can shorten terminal's definition to:
              self.terminal = Word(alphas,alphanums+"-") | oneOf("( ) + - * /")

      2. I like your use of instance methods for parse actions.  I believe this could actually be a thread-safe parser, *if* threads do not share grammar objects.
         
      3. In your method __bnf_bnf, it is probably unnecessary for every sub-expression (terminal, nonterminal, productionRule, etc.) to be kept as an instance variable of the grammar object.  Only the root level grammarRule is needed for later reference, for invoking parseString.

      Very nice work overall, and interesting application.  Look at Seo Sanghyeon's EBNF parser in the pyparsing samples directory for some BNF parsing ideas.

      -- Paul

       
      • Mike Stallard

        Mike Stallard - 2006-02-27

        That's great, thanks very much for the help Paul!

        I'm quite new to pyparsing and Python in general, but so far I can't help but be impressed by the elegance and usability of the modules that are out there...I'm actually using the above code in a Python implementation of Grammatical Evolution (http://www.grammatical-evolution.org/), an automatic program generation system that I'm developing for the Final Year Project of my Bachelor's in CS.

        Thanks again!

        Mike

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.