[Pyparsing] parsing a simple Language
Brought to you by:
ptmcg
From: Duncan M. <dun...@gm...> - 2010-10-10 15:51:15
|
Hey all! Paul, I think it's been since PyCon 2006 since we chatted last! I've been using pyparsing on and off since then for various projects. I've even used it at work for conceptual modeling (we're working on a gesture language for multi-touch in Ubuntu). However, I'm emailing for help about parsing syllabic rules in a natural language... Tibetan. I won't bore you with linguistic details, though: I've created a minimal example with a fake language below. Here are the rules: 1. All syllables must start with s, S, b, or B. 2. Syllables can be as short as one initial letter. 3. If there are additional consonants in the syllable, they must be one of m, M, p, or P. 4. Medial consonants may repeat multiple times. 5. Vowels are optional. Here was my first try at a grammar for these rules: init = Word('sSbB', max=1).setName("initial") med = Word('mMpP').setName("medial") vow = Word('aeiou', max=1).setName("vowel") syllables = Group(OneOrMore(Combine( init + ZeroOrMore(med) + Optional(vow) ))).setResultsName("syllables") For most cases, this resulted in the desired parsing: syllables.parseString("sabmaSMpo").asList() [['sa', 'bma', 'SMpo']] However, I discovered an edge case that wasn't covered. The following examples result in exceptions: syllables.parseString("sSma").asList() syllables.parseString("sbisi").asList() Now, if I change the init definition to the following: init = oneOf('S s b B').setName("initial") I get the desired results for everything. The two problem cases result in this: syllables.parseString("sSma").asList() [['s', 'Sma']] syllables.parseString("sbisi").asList() [['s', 'bi', 'si']] So it seems to me that Word should *somehow* be able to do this, though obviously my use of max=1 and the hope that this would do it is naive ;-) For the sake of consistency, I'd rather not have to join the list of initial characters with a space. Is there a way of accomplishing my goal with Word instead of oneOf? Thanks! d |