Re: [Pyparsing] use of Dict

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dict is not meant as "here is a dict entry with this particular keyword and
this value."  It is more meant as "here is a list of grouped entries and
values, to be returned as a dict; take the first item of each group as the
key, and the remaining items in each group as that key's value."  In your
case, a more likely definition would be:

keylabel = oneOf("hello world")
p = Dict(OneOrMore(Group(keylabel + (Word(nums) | Word(alphas,
alphanums)))))
results = p.parseString("hello abc world 2134")
print results.keys()
print results.dump()
print results.hello

The entries *must* be explicitly grouped, else the tokens will just run
together and Dict wont know where values stop and the next key starts.

In a larger grammar, the Dict expression is usually given a results name
(say "dictVals") and then the entries in the dict can be referenced as
"dictVals.hello" or "dictVals['world']" (using the keys from your example).

I tried to simplify the use of Dict by providing the dictOf helper method.
It would change the above to:

keylabel = oneOf("hello world")
p = dictOf( keylabel, (Word(nums) | Word(alphas, alphanums)) )

Where dictOf gets called with two expressions - the first is the expression
for matching keys in the dict, and the second expression is for matching the
values.

It is atypical (but not impossible) to have a list of known keywords that
would be keys.  In the dictExample.py script, which ships in the pyparsing
examples directory, the keys are labels in a table of data statistics: min,
max, etc.  These could have been hardcoded as oneOf("min max ave sdev"), but
I could just reference them as Word(alphas), since their placement in the
table was unambiguous.  The configParse.py example uses nested Dicts to
permit the values in an INI file to be referenced as
"config.section.subsection.subsubsection.etc"

-- Paul

Here is the text of dictExample.py - please download either the source or
docs distributions from SourceForge, to get the complete documentation and
examples directories (not included when using easy_install or the Windows
installer):

#
# dictExample.py
#
#  Illustration of using pyparsing's Dict class to process tabular data
#
# Copyright (c) 2003, Paul McGuire
#
from pyparsing import Literal, Word, Group, Dict, ZeroOrMore, alphas, nums,
delimitedList
import pprint

testData = """
+-------+------+------+------+------+------+------+------+------+
|       |  A1  |  B1  |  C1  |  D1  |  A2  |  B2  |  C2  |  D2  |
+=======+======+======+======+======+======+======+======+======+
| min   |   7  |  43  |   7  |  15  |  82  |  98  |   1  |  37  |
| max   |  11  |  52  |  10  |  17  |  85  | 112  |   4  |  39  |
| ave   |   9  |  47  |   8  |  16  |  84  | 106  |   3  |  38  |
| sdev  |   1  |   3  |   1  |   1  |   1  |   3  |   1  |   1  |
+-------+------+------+------+------+------+------+------+------+
"""

# define grammar for datatable
heading = (Literal(
"+-------+------+------+------+------+------+------+------+------+") + 
"|       |  A1  |  B1  |  C1  |  D1  |  A2  |  B2  |  C2  |  D2  |" + 
"+=======+======+======+======+======+======+======+======+======+").suppres
s()
vert = Literal("|").suppress()
number = Word(nums)
rowData = Group( vert + Word(alphas) + vert + delimitedList(number,"|") +
vert )
trailing = Literal(
"+-------+------+------+------+------+------+------+------+------+").suppres
s()

datatable = heading + Dict( ZeroOrMore(rowData) ) + trailing

# now parse data and print results
data = datatable.parseString(testData)
print data
pprint.pprint(data.asList())
print "data keys=", data.keys()
print "data['min']=", data['min']
print "data.max", data.max