Re: [Pyparsing] more refinement but still lost

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Sorry for my terse reply earlier - hit send too early!

Eric, you have definitely taken on an ambitious first-project for pyparsing.
Writing BNF's takes some practice, but it is important to really get your
thoughts down about how the parser is supposed to work before getting mired
down in Words, and Groups, Forwards, etc. In your nested terms, let the
recursion in the BNF take care of nesting []'s - when you have LBRACK/RBRACK
in two different levels of your nesting, it's a sign you should rethink just
how you have defined the contents of this group.

Here's my earlier post, with annotating comments.

-- Paul

all_tests = {
       "test_1": "some plain text",
       "test_2": "[simple ]",
       "test_3": "[simple_text some plain text]",
       "test_4": "[onearg [one ]]",
       "test_5": "[twoarg [one ] [two ]]",
       "test_6": "[onearg_text [one some plain text]]",
       "test_7": "[twoarg_text [one ] [two some plain text arg]]",
       "test_8": "[nested_text some [not plain] text]",
       "test_9": "[nested_text [one text] some [not [very ] plain] text]",
       "test_10": "[nested_text_escaped [one text] some [not [very ] plain]
bracketed \[text\]]",
       "test_11": """[nested_text_escaped_indented 
                            [one text] some 
                            [not 
                                   [very ] 
                                   plain
                            ] 
                            bracked \[text\]
                     ]""",
}

# a simple BNF:
#
#   listExpr     ::= '[' listContent ']'
#   listContent  ::= (contentsWord | escapedChar | listExpr)*
#   contentsWord ::= printableCharacter+
#
#
# Some notes:
# 1. listContent could be empty, "[]" is a valid listExpr
# 2. contentsWord cannot contain '\', '[' or ']' characters, or
#    else we couldn't distinguish delimiters from contents, or 
#    detect escapes
#

from pyparsing import *

# start with the basics
LBRACK,RBRACK = map(Suppress,"[]")
escapedChar = Combine('\\' + oneOf(list(printables)))
contentsWord = Word(printables,excludeChars=r"\[]")

# define a placeholder for a nested list, since we need to 
# reference it before it is fully defined
listExpr = Forward()

# the contents of a list is one or more contents words or lists
listContent = ZeroOrMore(contentsWord | escapedChar | listExpr)

# a list is a listContent enclosed in []'s - enclose
# in a Group so that pyparsing will maintain the nested structure
#
# since listExpr was already defined as a Forward, we use '<<' to
# "inject" the definition into the already defined Forward
listExpr << Group(LBRACK + listContent + RBRACK)

# parse the test string - note that the results no longer contain 
# the parsed '[' and ']' characters, but they do retain the 
# nesting of the original string in nested lists
for name,testStr in all_tests.items():
    print name, listContent.parseString(testStr).asList()

prints:

test_11 [['nested_text_escaped_indented', ['one', 'text'], 'some', ['not',
['very'], 'plain'], 'bracked', '\\[', 'text', '\\]']]
test_10 [['nested_text_escaped', ['one', 'text'], 'some', ['not', ['very'],
'plain'], 'bracketed', '\\[', 'text', '\\]']]
test_7 [['twoarg_text', ['one'], ['two', 'some', 'plain', 'text', 'arg']]]
test_6 [['onearg_text', ['one', 'some', 'plain', 'text']]]
test_5 [['twoarg', ['one'], ['two']]]
test_4 [['onearg', ['one']]]
test_3 [['simple_text', 'some', 'plain', 'text']]
test_2 [['simple']]
test_1 ['some', 'plain', 'text']
test_9 [['nested_text', ['one', 'text'], 'some', ['not', ['very'], 'plain'],
'text']]
test_8 [['nested_text', 'some', ['not', 'plain'], 'text']]

# pyparsing includes a short-cut to simplify defining nested
# structures like this
print nestedExpr('[',']').parseString(all_tests['test_9']).asList()