Python parsing module / Discussion / Help/Open Discussion: Empty dicts?

Mike Stallard - 2006-02-26

Hi there,

First off, I'd like to say thank you very much for writing such a fantastic tool, which has made my life a heck of a lot easier :)

I've stumbled across something that seems like a bit of an oddity. The lists I generate from my code seem to work perfectly as I'd expect, aside from when I pprint them or compare them to themselves - they seemingly contain empty dict's that I cannot for the life of me figure out where they come from.

For example

Text that appears to parse as:
[['Sin'], ['Cos'], ['Tan'], ['Log']]

pprints as:
([(['Sin'], {}), (['Cos'], {}), (['Tan'], {}), (['Log'], {})], {})

And
[['X']]
pprints as:
([(['X'], {})], {})

Normally I wouldn't be bothered by it, but I'm trying to implement some unit testing on the code, and it kinda falls over with this.

I'll dump the code that I've written here...

==================

from pyparsing import Combine,ParseException,Forward, Group, Word, Literal, alphas, OneOrMore, Optional, alphanums, empty
import pprint

pp = pprint.PrettyPrinter(indent=4)

class grammar:
        termDict = dict()
        currentAssign = ""
        startSymbol = ""
        exprExpn = [['<expr>', '<op>', '<expr>'], ['(', '<expr>', '<op>', '<expr>', ')'], ['<pre-op>', '(', '<expr>', ')'], ['<var>']]

        def __handleProductionRuleLeft(self, s, loc, toks):
               self.currentAssign = toks[0]

        def __handleStartSymbol(self, s, loc, toks):
                self.startSymbol = toks[0]
                self.currentAssign = "S"
        def __handleProd(self, s, loc, toks):
                pass

        def __bnf_bnf(self):
                # we include strange symbols in terminals for brevity
                self.terminal = Word(alphas,alphanums+"-") ^ Literal("(") ^ Literal(")") ^ Literal("+") ^ Literal("-") ^ Literal("*") ^ Literal("/")
                self.nonterminal = Combine(Literal("<") + self.terminal + Literal(">"))

                # our BNF-like grammar sets up the start symbol with a rule as follows...
                # S ::= <starting-non-terminal>
                self.startSymbolRule = Literal("S").suppress() + Literal("::=").suppress() + self.nonterminal
                self.startSymbolRule.setParseAction(self.__handleStartSymbol)

                self.productionRuleLeft = self.nonterminal + Literal("::=").suppress()
                self.productionRuleLeft.setParseAction(self.__handleProductionRuleLeft)

                self.productionRule = self.productionRuleLeft.suppress() + (\
                                                                OneOrMore(\
                                                                        Group(OneOrMore(self.terminal ^ self.nonterminal)\
                                                                                + Optional(Literal("|").suppress())\
                                                                        )\
                                                                )\
                                                        ) + empty

                self.productionRule.setParseAction(self.__handleProd)

                self.grammarRule = self.productionRule ^ self.startSymbolRule

        def getExpansions(self,nonTerminal):
                return self.termDict[nonTerminal]

        def __init__(self,gramFile):
                self.__bnf_bnf()

                self.gramFile = open(gramFile,"r")

                for line in self.gramFile:
                        try:
                                currentTerms = self.grammarRule.parseString(line)
                                if self.currentAssign != "S":
                                        self.termDict[self.currentAssign] = currentTerms
                                        print(currentTerms)
                                        pp.pprint(currentTerms)

                        except ParseException, err:
                                print err.line
                                print err

asdf = grammar("gramm.bnf")

============= EOF ===============

And here is the file it is parsing:

S ::= <expr>
<expr> ::= <expr> <op> <expr> | ( <expr> <op> <expr> ) | <pre-op> ( <expr> ) | <var>
<op> ::= + | - | / | *
<pre-op> ::= Sin | Cos | Tan | Log
<var> ::= X

============= EOF =============

Could someone please explain to me what I'm doing wrong and how I could fix it (if possible!)

Many thanks,

Mike

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Mike Stallard - 2006-02-26
  
  Here's some extra output from doing a
  
  self.grammarRule.setDebug()
  
  ==============================
  
  Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
  Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> ['<expr>']
  Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
  Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['<expr>', '<op>', '<expr>'], ['(', '<expr>', '<op>', '<expr>', ')'], ['<pre-op>', '(', '<expr>', ')'], ['<var>']]
  Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
  Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['+'], ['-'], ['/'], ['*']]
  Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
  Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['Sin'], ['Cos'], ['Tan'], ['Log']]
  Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
  Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['X']]
  
  ======================
  
  From looking further through this forum, I noticed that the problem I'm having looks kinda similar to the problems Jackey Sieka is/was having with asXML (http://sourceforge.net/forum/forum.php?thread_id=1427882&forum_id=337293) - could they be somehow related?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Mike Stallard - 2006-02-26
    
    Argh, "Jacek" not "Jackey"...apologies, it's been a long day ;)
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul McGuire - 2006-02-27
  
  Mike -
  
  One basic concept is that pyparsing returns not lists or dictionaries, but ParseResults. ParseResults are complex data structures that permit list, dict, and object -like access. It looks like a list since the __str__ method outputs a list-like view, but it is much more than a list.
  
  For pprint to show properly, use the ParseResults asList() method. That is, instead of;
  
     results = bnf.parseString(data)
     pprint.pprint(results)
  
  use:
  
     results = bnf.parseString(data)
     pprint.pprint(results.asList())
  
  Here are some other comments:
  1. You can shorten terminal's definition to:
          self.terminal = Word(alphas,alphanums+"-") | oneOf("( ) + - * /")
  
  2. I like your use of instance methods for parse actions. I believe this could actually be a thread-safe parser, *if* threads do not share grammar objects.
  
  3. In your method __bnf_bnf, it is probably unnecessary for every sub-expression (terminal, nonterminal, productionRule, etc.) to be kept as an instance variable of the grammar object. Only the root level grammarRule is needed for later reference, for invoking parseString.
  
  Very nice work overall, and interesting application. Look at Seo Sanghyeon's EBNF parser in the pyparsing samples directory for some BNF parsing ideas.
  
  -- Paul
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Mike Stallard - 2006-02-27
    
    That's great, thanks very much for the help Paul!
    
    I'm quite new to pyparsing and Python in general, but so far I can't help but be impressed by the elegance and usability of the modules that are out there...I'm actually using the above code in a Python implementation of Grammatical Evolution (http://www.grammatical-evolution.org/), an automatic program generation system that I'm developing for the Final Year Project of my Bachelor's in CS.
    
    Thanks again!
    
    Mike
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Empty dicts?

Forums

Help

Empty dicts? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Empty dicts?