First off, I'd like to say thank you very much for writing such a fantastic tool, which has made my life a heck of a lot easier :)
I've stumbled across something that seems like a bit of an oddity. The lists I generate from my code seem to work perfectly as I'd expect, aside from when I pprint them or compare them to themselves - they seemingly contain empty dict's that I cannot for the life of me figure out where they come from.
For example
Text that appears to parse as:
[['Sin'], ['Cos'], ['Tan'], ['Log']]
def __bnf_bnf(self):
# we include strange symbols in terminals for brevity
self.terminal = Word(alphas,alphanums+"-") ^ Literal("(") ^ Literal(")") ^ Literal("+") ^ Literal("-") ^ Literal("*") ^ Literal("/")
self.nonterminal = Combine(Literal("<") + self.terminal + Literal(">"))
# our BNF-like grammar sets up the start symbol with a rule as follows...
# S ::= <starting-non-terminal>
self.startSymbolRule = Literal("S").suppress() + Literal("::=").suppress() + self.nonterminal
self.startSymbolRule.setParseAction(self.__handleStartSymbol)
for line in self.gramFile:
try:
currentTerms = self.grammarRule.parseString(line)
if self.currentAssign != "S":
self.termDict[self.currentAssign] = currentTerms
print(currentTerms)
pp.pprint(currentTerms)
From looking further through this forum, I noticed that the problem I'm having looks kinda similar to the problems Jackey Sieka is/was having with asXML (http://sourceforge.net/forum/forum.php?thread_id=1427882&forum_id=337293) - could they be somehow related?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
One basic concept is that pyparsing returns not lists or dictionaries, but ParseResults. ParseResults are complex data structures that permit list, dict, and object -like access. It looks like a list since the __str__ method outputs a list-like view, but it is much more than a list.
For pprint to show properly, use the ParseResults asList() method. That is, instead of;
Here are some other comments:
1. You can shorten terminal's definition to:
self.terminal = Word(alphas,alphanums+"-") | oneOf("( ) + - * /")
2. I like your use of instance methods for parse actions. I believe this could actually be a thread-safe parser, *if* threads do not share grammar objects.
3. In your method __bnf_bnf, it is probably unnecessary for every sub-expression (terminal, nonterminal, productionRule, etc.) to be kept as an instance variable of the grammar object. Only the root level grammarRule is needed for later reference, for invoking parseString.
Very nice work overall, and interesting application. Look at Seo Sanghyeon's EBNF parser in the pyparsing samples directory for some BNF parsing ideas.
-- Paul
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm quite new to pyparsing and Python in general, but so far I can't help but be impressed by the elegance and usability of the modules that are out there...I'm actually using the above code in a Python implementation of Grammatical Evolution (http://www.grammatical-evolution.org/), an automatic program generation system that I'm developing for the Final Year Project of my Bachelor's in CS.
Thanks again!
Mike
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi there,
First off, I'd like to say thank you very much for writing such a fantastic tool, which has made my life a heck of a lot easier :)
I've stumbled across something that seems like a bit of an oddity. The lists I generate from my code seem to work perfectly as I'd expect, aside from when I pprint them or compare them to themselves - they seemingly contain empty dict's that I cannot for the life of me figure out where they come from.
For example
Text that appears to parse as:
[['Sin'], ['Cos'], ['Tan'], ['Log']]
pprints as:
([(['Sin'], {}), (['Cos'], {}), (['Tan'], {}), (['Log'], {})], {})
And
[['X']]
pprints as:
([(['X'], {})], {})
Normally I wouldn't be bothered by it, but I'm trying to implement some unit testing on the code, and it kinda falls over with this.
I'll dump the code that I've written here...
==================
from pyparsing import Combine,ParseException,Forward, Group, Word, Literal, alphas, OneOrMore, Optional, alphanums, empty
import pprint
pp = pprint.PrettyPrinter(indent=4)
class grammar:
termDict = dict()
currentAssign = ""
startSymbol = ""
exprExpn = [['<expr>', '<op>', '<expr>'], ['(', '<expr>', '<op>', '<expr>', ')'], ['<pre-op>', '(', '<expr>', ')'], ['<var>']]
def __handleProductionRuleLeft(self, s, loc, toks):
self.currentAssign = toks[0]
def __handleStartSymbol(self, s, loc, toks):
self.startSymbol = toks[0]
self.currentAssign = "S"
def __handleProd(self, s, loc, toks):
pass
def __bnf_bnf(self):
# we include strange symbols in terminals for brevity
self.terminal = Word(alphas,alphanums+"-") ^ Literal("(") ^ Literal(")") ^ Literal("+") ^ Literal("-") ^ Literal("*") ^ Literal("/")
self.nonterminal = Combine(Literal("<") + self.terminal + Literal(">"))
# our BNF-like grammar sets up the start symbol with a rule as follows...
# S ::= <starting-non-terminal>
self.startSymbolRule = Literal("S").suppress() + Literal("::=").suppress() + self.nonterminal
self.startSymbolRule.setParseAction(self.__handleStartSymbol)
self.productionRuleLeft = self.nonterminal + Literal("::=").suppress()
self.productionRuleLeft.setParseAction(self.__handleProductionRuleLeft)
self.productionRule = self.productionRuleLeft.suppress() + (\ OneOrMore(\ Group(OneOrMore(self.terminal ^ self.nonterminal)\ + Optional(Literal("|").suppress())\ )\ )\ ) + empty
self.productionRule.setParseAction(self.__handleProd)
self.grammarRule = self.productionRule ^ self.startSymbolRule
def getExpansions(self,nonTerminal):
return self.termDict[nonTerminal]
def __init__(self,gramFile):
self.__bnf_bnf()
self.gramFile = open(gramFile,"r")
for line in self.gramFile:
try:
currentTerms = self.grammarRule.parseString(line)
if self.currentAssign != "S":
self.termDict[self.currentAssign] = currentTerms
print(currentTerms)
pp.pprint(currentTerms)
except ParseException, err:
print err.line
print err
asdf = grammar("gramm.bnf")
============= EOF ===============
And here is the file it is parsing:
S ::= <expr>
<expr> ::= <expr> <op> <expr> | ( <expr> <op> <expr> ) | <pre-op> ( <expr> ) | <var>
<op> ::= + | - | / | *
<pre-op> ::= Sin | Cos | Tan | Log
<var> ::= X
============= EOF =============
Could someone please explain to me what I'm doing wrong and how I could fix it (if possible!)
Many thanks,
Mike
Here's some extra output from doing a
self.grammarRule.setDebug()
==============================
Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> ['<expr>']
Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['<expr>', '<op>', '<expr>'], ['(', '<expr>', '<op>', '<expr>', ')'], ['<pre-op>', '(', '<expr>', ')'], ['<var>']]
Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['+'], ['-'], ['/'], ['*']]
Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['Sin'], ['Cos'], ['Tan'], ['Log']]
Match {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} at loc 0 (1,1)
Matched {{Suppress:({Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"}) Suppress:("::=")}) {Group:({{{W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/" ^ Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}}... [Suppress:("|")]})}...} ^ {Suppress:("S") Suppress:("::=") Combine:({"<" {W:(abcd...,abcd...) ^ "(" ^ ")" ^ "+" ^ "-" ^ "*" ^ "/"} ">"})}} -> [['X']]
======================
From looking further through this forum, I noticed that the problem I'm having looks kinda similar to the problems Jackey Sieka is/was having with asXML (http://sourceforge.net/forum/forum.php?thread_id=1427882&forum_id=337293) - could they be somehow related?
Argh, "Jacek" not "Jackey"...apologies, it's been a long day ;)
Mike -
One basic concept is that pyparsing returns not lists or dictionaries, but ParseResults. ParseResults are complex data structures that permit list, dict, and object -like access. It looks like a list since the __str__ method outputs a list-like view, but it is much more than a list.
For pprint to show properly, use the ParseResults asList() method. That is, instead of;
results = bnf.parseString(data)
pprint.pprint(results)
use:
results = bnf.parseString(data)
pprint.pprint(results.asList())
Here are some other comments:
1. You can shorten terminal's definition to:
self.terminal = Word(alphas,alphanums+"-") | oneOf("( ) + - * /")
2. I like your use of instance methods for parse actions. I believe this could actually be a thread-safe parser, *if* threads do not share grammar objects.
3. In your method __bnf_bnf, it is probably unnecessary for every sub-expression (terminal, nonterminal, productionRule, etc.) to be kept as an instance variable of the grammar object. Only the root level grammarRule is needed for later reference, for invoking parseString.
Very nice work overall, and interesting application. Look at Seo Sanghyeon's EBNF parser in the pyparsing samples directory for some BNF parsing ideas.
-- Paul
That's great, thanks very much for the help Paul!
I'm quite new to pyparsing and Python in general, but so far I can't help but be impressed by the elegance and usability of the modules that are out there...I'm actually using the above code in a Python implementation of Grammatical Evolution (http://www.grammatical-evolution.org/), an automatic program generation system that I'm developing for the Final Year Project of my Bachelor's in CS.
Thanks again!
Mike