Anonymous - 2012-04-25

Thank you for creating pyparsing - it's really a lifesaver!  I'm just starting out with it and having a bit of a hard time trying to get it to do what I want.

I'm trying to parse words into "prefix", "stem" and "suffix". Prefixes and suffixes are optional, and can be made up of several parts (i.e., a prefix could be "conjunction" + "definite article").  Here's the grammar:

endOfString = StringEnd()
conjunction = oneOf("w f")
preposition = oneOf("l b")
def_art = oneOf("al l")
noun_prefix = Group( Optional(conjunction("conjunction")) + Optional(preposition("preposition"))
                     + Optional(def_art("article")))
noun_suffix = oneOf("y na k km w h ha hm") + FollowedBy(endOfString)
poss_noun = Optional( Optional(conjunction) + Optional(preposition) )("prefixes") + \
             SkipTo(noun_suffix | endOfString)("stem") + \
def_noun =  Optional( Optional(conjunction) + Optional(preposition) + Optional(def_art) )("prefixes") + \
noun = Or( [poss_noun, def_noun ] )("noun")

My problem is that I'd like to get the maximum parse (i.e., breaking the word up into as many pieces as possible), not necessarily the longest result.  For instance, I have nouns defined as:

noun = Or( [def_noun, poss_noun ] )("noun")

in order to enforce a rule that if a noun has a definite article, it can't also have a possessive ending. The problem I'm having is that the parser matches whichever pattern is first in the Or statement, and doesn't seem to try the other one.  Here's what it does (with the parse that I would have preferred added as a comment to the right):

>>> noun = Or( [def_noun, poss_noun ] )("noun")
>>> for word in wordlist:
...     noun.parseString(word).asList()
['al', 'dar']                             # correct
['b', 'al', 'blad']                      # correct
['al', 'blad']                           # correct
['b', 'ytw']                             # b + yt + w
['b', 'al', 'Hq']                       # correct
['l', 'bytw']                            # l + byt + w
>>> noun = Or( [poss_noun, def_noun ] )("noun")
>>> for word in wordlist:
...     noun.parseString(word).asList()
['aldar']                             # al + dar
['b', 'alblad']                      # b + al + blad
['alblad']                           # al + blad
['b', 'yt', 'w']                       # correct
['b', 'alHq']                        # b + al + Hq
['l', 'byt', 'w']                      # correct

So it's matching whichever pattern is first, instead of which pattern is the best match.  What am I doing wrong?