pyparsing-users Mailing List for Python parsing module (Page 12)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Werner F. B. <wer...@fr...> - 2011-01-06 14:10:18
|
I am having some problems decoding these messages. The data comes in as an email message with a defined content type as "Content-Type: text/plain", however it is really Content-Type: text/plain; charset="windows-1252", so I read it in with thisfile = codecs.open(regFile, "r", "windows-1252"). The parsing works fine except on things like: address_name = Göran Petterson Which I parse with: alphanums = pyp.Word(pyp.alphanums) # address str_add_name = pyp.Literal("address_name =").suppress() +\ alphanums + pyp.restOfLine add_name = str_add_name.setParseAction(self.str_add_nameAction) But I get in str_add_nameAction: ([u'G', u'\xf6ran Petterson\r'], {}) The raw data at this point is "address_name = G\xf6ran Petterson" What am I doing wrong in all this? I tried using pyp.printables instead of alphanums but with the same result. A tip would be very much appreciated. Werner P.S. Happy New Year to you all. |
From: Andrea C. <an...@cd...> - 2010-12-26 05:25:38
|
Hi, I found PyParsing really easy to work with. Here is what I built on top of it: http://andreacensi.github.com/contracts/ In the mean time, I toyed around and changed something in it. Perhaps some of this is helpful. The goal was to get better error messages ("closer" to the error; I hope you know what I mean). The following are the bits that I think are useful. 1) operatorPrecedence, modification 1 Around line 3579, there is: elif arity == 2: if opExpr is not None: matchExpr = FollowedBy(lastExpr + opExpr + lastExpr) + Group( lastExpr + OneOrMore( opExpr + lastExpr ) ) This seems wasteful and does not use "-" when it should. I modified it as such: elif arity == 2: if opExpr is not None: matchExpr = Group(lastExpr + FollowedBy(opExpr) + OneOrMore(opExpr - lastExpr)) In this way, we advance the pointer past the opExpr. I think this is the right semantics for 99% of the cases. The exception is if the user is overloading the opExprs. 2) operatorPrecedence, modification 2 At the beginning, you have: lastExpr = baseExpr | ( Suppress('(') + ret + Suppress(')') ) I modified it using: opnames = ",".join(str(x) for x in allops) parenthesis = Suppress('(') + ret + FollowedBy(NotAny(oneOf(allops))) - Suppress(')') lastExpr = parenthesis | baseExpr Basically if I see the parenthesis, after a ret, if there isn't an operator, we have to find the parenthesis. These two together make me have much better error messages: (see in fixed width) line 1 >list(1,2,(tuple(str,a,(?))) ^ | here or nearby line 1 >1+(3*2?) ^ | here or nearby You can find the whole function here: https://github.com/AndreaCensi/contracts/blob/feature/better_error_messages/src/contracts/pyparsing_utils.py (this is not the main branch) 2) Catching ambiguity in Or(). Because my grammar is context-dependent (meaning that 'x' might be parsed differently according to the context), I had several debugging pains, especially when I was trying to get rid of the Or() in favor of MatchFirst(). What I did was modify Or() such that it checks that, if two clauses can parse the string with the same number of characters, then they have to have the same ParseResults. (if that's not true, it's a disaster waiting to happen) This involved adding __eq__ to ParseResults and then add the following to Or. Where it says: else: if loc2 > maxMatchLoc: maxMatchLoc = loc2 maxMatchExp = e I modify it in: else: if loc2 > maxMatchLoc: maxMatchLoc = loc2 maxMatchExp = e elif loc2 == maxMatchLoc: val1 = e._parse(instring, loc, True) val2 = maxMatchExp._parse(instring, loc, True) if not(val1 == val2): msg = ('Ambiguous syntax, I could match both (and maybe more):\n- %s\n- %s\n.' % (get_desc(e), get_desc(maxMatchExp))) msg += 'Their values are: \n' msg += '- {0!r}\n'.format(val1) msg += '- {0!r}\n'.format(val2) raise ParseFatalException(instring, loc, msg, self) You can see this in https://github.com/AndreaCensi/contracts/blob/feature/better_error_messages/src/contracts/mypyparsing.py#L2546 (ignore the other stuff I changed in mypyparsing; I was "experimenting" to understand that business of Fatal vs non-Fatal exceptions) Best, Andrea -- Andrea Censi PhD student, Control & Dynamical Systems, Caltech http://www.cds.caltech.edu/~andrea/ "Life is far too important to be taken seriously" (Oscar Wilde) |
From: Helmut J. <jar...@ig...> - 2010-12-23 11:41:18
|
Hi, I'd like to write an ebuild on my GenToo system for installing Pyparsing from SVN. If I specify that pyparsing is supporting Python 3 my install scripts execute the file pyparsing_py3.py with Python-3.1 pyparsing_py3.py has an import builtin statement which fails under Python3. What am I missing? Thanks for a hint, Helmut. -- Helmut Jarausch Lehrstuhl fuer Numerische Mathematik RWTH - Aachen University D 52056 Aachen, Germany |
From: Michael F. <nob...@gm...> - 2010-11-15 16:39:48
|
Thanks! I've confirmed it works, even after suppressing 'get' and 'AT', and grouping the shopping list. --Michael On Sat, Nov 13, 2010 at 8:25 PM, Paul McGuire <pt...@au...> wrote: > I don't think you've gone overboard here. Your BNF *will* be somewhat > informal, but don't give up on it. > > Your basic form is: > > (get) (stuff) (at) (location) > > Given that you have some entries where you just get stuff, or specify a > location just based on a leading '@' symbol, this gets a little more > complex, (using []'s for optional parts: > > (get) (stuff) [[(at)] (location)] > > (stuff) and (location) are going to be pretty unstructured, but > fortunately, > you're defining some specific forms for (get) and (at). > > Your (stuff) can contain a list of items separated by commas, (and), or (, > and), so I think you can define it as pretty open for (item), and then > define a delimited list for the list of items. You'll need to specify the > lookahead (as you already picked up in your posted pyparsing code) to avoid > parsing "at" or "and" or a store label with a leading '@' as an item word. > And using Keyword's for your delimiting words is a good choice, to guard > against accidentally reading the leading 'at' in 'athletic socks' as 'at', > and the remaining 'hletic socks' as some kind of store. > > So I'd informally write this as: > > shopping-list ::= get stuff [[at] location] > get ::= "get" | "buy" | "pick up" > at ::= "at" | "from" > item ::= (~(and | at | '@') Word(alphas+'-'))... > stuff ::= item [ (', and' | ',' | 'and') item ]... > location ::= ['@'] Word(alphas)... > > Rendered into pyparsing, it ends up very similar to your posted code: > > COMMA,AT = map(Literal, ',@') > KW = CaselessKeyword > get = KW('get') | KW('buy') | KW('pick') + KW('up') | KW('need') > at = KW('at') | KW('from') > location = Combine(Optional(AT) + OneOrMore(Word(alphas)), ' ', > adjacent=False) > and_ = KW('and') > itemdelim = COMMA + and_ | COMMA | and_ > item = Combine(OneOrMore(~(at | itemdelim | '@') + Word(alphas)), ' ', > adjacent=False) > stuff = delimitedList(item, itemdelim) > shoppingList = get + stuff("items") + Optional(Optional(at) + > location("location")) > > And this seems to work ok for your posted tests. > > Please try to avoid defining Literals with embedded spaces, and > *especially* > not with leading spaces (as you do with " and ") - pyparsing's default > whitespace skipping will almost surely make your leading-whitespace literal > unmatchable. Note how I defined "pick up" as an option for (get) as two > joined keywords - this immunizes us against cases with extra whitespace > between the two words, at very little cost. > > Thanks for writing, and welcome to the World of Pyparsing! - :) > > -- Paul > > > |
From: Paul M. <pt...@au...> - 2010-11-14 03:25:48
|
I don't think you've gone overboard here. Your BNF *will* be somewhat informal, but don't give up on it. Your basic form is: (get) (stuff) (at) (location) Given that you have some entries where you just get stuff, or specify a location just based on a leading '@' symbol, this gets a little more complex, (using []'s for optional parts: (get) (stuff) [[(at)] (location)] (stuff) and (location) are going to be pretty unstructured, but fortunately, you're defining some specific forms for (get) and (at). Your (stuff) can contain a list of items separated by commas, (and), or (, and), so I think you can define it as pretty open for (item), and then define a delimited list for the list of items. You'll need to specify the lookahead (as you already picked up in your posted pyparsing code) to avoid parsing "at" or "and" or a store label with a leading '@' as an item word. And using Keyword's for your delimiting words is a good choice, to guard against accidentally reading the leading 'at' in 'athletic socks' as 'at', and the remaining 'hletic socks' as some kind of store. So I'd informally write this as: shopping-list ::= get stuff [[at] location] get ::= "get" | "buy" | "pick up" at ::= "at" | "from" item ::= (~(and | at | '@') Word(alphas+'-'))... stuff ::= item [ (', and' | ',' | 'and') item ]... location ::= ['@'] Word(alphas)... Rendered into pyparsing, it ends up very similar to your posted code: COMMA,AT = map(Literal, ',@') KW = CaselessKeyword get = KW('get') | KW('buy') | KW('pick') + KW('up') | KW('need') at = KW('at') | KW('from') location = Combine(Optional(AT) + OneOrMore(Word(alphas)), ' ', adjacent=False) and_ = KW('and') itemdelim = COMMA + and_ | COMMA | and_ item = Combine(OneOrMore(~(at | itemdelim | '@') + Word(alphas)), ' ', adjacent=False) stuff = delimitedList(item, itemdelim) shoppingList = get + stuff("items") + Optional(Optional(at) + location("location")) And this seems to work ok for your posted tests. Please try to avoid defining Literals with embedded spaces, and *especially* not with leading spaces (as you do with " and ") - pyparsing's default whitespace skipping will almost surely make your leading-whitespace literal unmatchable. Note how I defined "pick up" as an option for (get) as two joined keywords - this immunizes us against cases with extra whitespace between the two words, at very little cost. Thanks for writing, and welcome to the World of Pyparsing! - :) -- Paul |
From: Michael F. <nob...@gm...> - 2010-11-12 18:26:25
|
Hello all, First, thanks for the wonderful programming API. I'm having a long of fun with it. I have a question, I feel like I'm ruining the BNF expression when overly massage the input. Should I stick to BNF or start tokenizing? Allow me to explain. I've always been fascinated with interfaces which seem to understand human language, kind of like gmail's "quick-add" featurer for its calender. So, to learn how to do this better, I'm writing a little diddy to interpret simple commands into creating a shopping list. Here is a list of generic input I expect it to understand #test grammar tests= """\ buy chocolate milk, eggs and bread at store get eggs at @Vons need headache medicine and cough medicine from downtown drug store buy nails, claw hammer, and studs at @homedepot buy ragjoint at car parts store buy laptop at electronics store get candy @Sees get a clue""".splitlines() My evil mashup is as follows... # define grammar KW = Keyword and_ = Literal(" and ") comma = Literal(",") at = KW("from") | KW("at") label = Literal("@") + Word(alphanums) buy = KW("buy") | KW("get") | KW("need") item = Group( OneOrMore( ~ (at | and_) + (Word(alphas + "-"))) + Suppress(Optional(comma))) | Suppress(Optional(and_)) items = Group( OneOrMore( item ) ) store = KW("store") | KW("market") | KW("grocery") storetype = Suppress(at) + Group(OneOrMore( Word(alphas))) storelabel = label storename = storetype | storelabel shlisti = Suppress(buy) + items + Optional( storename ) What I wish to do is interpret a sentence starts with some synonym of "buy", interprets everything after that as a list of things to buy (accepting and as a synonym for a comma) until it reaches some synonym of "from", and then everything after that is interpreted as a store name. All I want to keep is the list of items, and the storename. What I wind up doing is as follows: buy chocolate milk, eggs and bread at store -> [[['chocolate', 'milk']]] get eggs at @Vons -> [[['eggs']]] need headache medicine and cough medicine from downtown drug store -> [[['headache', 'medicine', 'and', 'cough', 'medicine']], ['downtown', 'drug', 'store']] buy nails, claw hammer, and studs at @homedepot -> [[['nails']]] buy ragjoint at car parts store -> [[['ragjoint']], ['car', 'parts', 'store']] buy laptop at electronics store -> [[['laptop']], ['electronics', 'store']] get candy @Sees -> [[['candy']], '@', 'Sees'] get a clue -> [[['a', 'clue']]] Is there a way to express it in the nice BNF form or did I already jump off that reservation with the grouping and suppressing? |
From: Paul M. <pt...@au...> - 2010-10-10 23:02:47
|
> I didn't complicate my minimal example with it, but I've got results > names set for initials, medials, and vowels. > Since the Regex option seems to be a likely path, you can use named capture groups in your re, and the Regex class will convert these to results names. syllable = Group(Regex(r"(?P<init>[sSbB])(?P<meds>[mMpP]*)(?P<vow>[aeiou]?)")) Iterate over your syllables, and you can access the fields as in: res = syllables.parseString(t) for syl in res.syllables: print syl.init, syl.meds, syl.vow - or - print "init: %(init)s meds:%(meds)s vow:%(vow)s" % syl > Thanks again for your help and insights, Paul -- once again, pyparsing > shines in all of its glory :-) > > d :) Thanks! As I said, this will be very interesting to see how it plays out. Pyparsing is already being used in zhpy to support Chinese language Python development in Python versions 2.x that pre-date support for Unicode identifiers. I'd also like to hear sometime just how you got involved in this application in the first place (perhaps you've already captured this in a blog post - just send me the post). Best regards, -- Paul |
From: Duncan M. <dun...@gm...> - 2010-10-10 21:50:06
|
On Sun, Oct 10, 2010 at 10:56 AM, Paul McGuire <pt...@au...> wrote: > Duncan, my friend, so good to hear from you again! I'm glad pyparsing > continues to be of some use to you. I must admit, you are the first I have > heard of to be parsing Tibetan with pyparsing. I think I can propose a few > alternative solutions for you. > > First of all, your immediate problem has to do with your use of 'max'. 'max > = 1' means just that, 1 AND NO MORE! Ah, I see. I had incorrectly interpreted that as "match only one initial, and if another initial is found, starting parsing that as a new syllable." > In your failing case, "sSmi", the > leading 's' is followed by another 'S', which by definition of your init > word is not allowed; you exceeded the maximum -> parser fail! Fortunately, > the simplest remedy is to use the 'exact' argument instead of 'max': > > init = Word('sSbB', exact=1).setName("initial") > med = Word('mMpP').setName("medial") > vow = Word('aeiou', exact=1).setName("vowel") > > 'exact' does not impose the same lookahead restriction that 'max' does. > > If your test case is close enough to your Tibetan application, you might try > one of these other options. You can merge your initial and medial > expressions into a single word, since what you describe is exactly the same > as the 2-argument constructor for word. Breaking out the definition of > syllable as: > > syllable = Combine( > init + ZeroOrMore(med) + Optional(vow) > ) > syllables = Group(OneOrMore(syllable)).setResultsName("syllables") > > The first two bits of your syllable can be merged into a single Word > expression: > > syllable = Combine( > Word('sSbB', 'mMpP') + Optional(vow) > ) > syllables = Group(OneOrMore(syllable)).setResultsName("syllables") Hrm, I tried that, but wan't able to figure out to get at the parsed data for the medials. I need to be able to introspect the parsed data in order to perform various conversion operations (at a later time). I didn't complicate my minimal example with it, but I've got results names set for initials, medials, and vowels. > Or if you can tolerate an even more liberal expression (which would match if > vowels were mixed in with medials, and not just added to the end): > > syllable = Word('sSbB', 'mMpPaeiouAEIOU') > > This will parse fairly quickly as well, since it is able to internally > convert this entire thing into the single regex "[sSbB][mMpPaeiouAEIOU]*". Ah, this is a great example -- thanks! Sadly, I can't use it, since the rules for vowels in Tibetam unicode are strict about being at the end. > If you still need the more rigor of your original case (only a single > potential vowel at the end of the syllable, not mixed in with medials), you > might still try rolling your own Regex: > > syllable = Regex(r"[sSbB][mMpP]*[aeiou]?") Oh, this is very nice. I'm going to play with this some more. Thanks! > I've found that for low-level tokens like words and numbers, using a Regex > really outperforms "Combine(startWithThis + (somethingElse|anotherThing) + > Optional(stillAnotherThing))"; while keeping the re's localized to just a > simple building block pretty much keeps them from getting too out-of-hand. > For instance, I've modified the fourFn.py example that ships with pyparsing > to show the old style commented out, and a still-fairly-easy-to-follow-regex > replacement: > > #~ fnumber = Combine( Word( "+-"+nums, nums ) + > #~ Optional( point + Optional( Word( nums ) ) ) + > #~ Optional( e + Word( "+-"+nums, nums ) ) ) > fnumber = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?") > ident = Word(alphas, alphas+nums+"_$") > > If these syllabic constructs in Tibetan can be built up from single Unicode > characters, then I think all of these suggestions are still valid, even down > to the Regex idea. > > I'd be very interested to see more of your Tibetan parser, as things > progress - good luck! Once I get it hammered out, I'll reply with a single-file example :-) It's part of a library I'm creating to support advanced features in Tibetan software, but the grammar itself should lend itself nicely to an example. Thanks again for your help and insights, Paul -- once again, pyparsing shines in all of its glory :-) d |
From: Paul M. <pt...@au...> - 2010-10-10 16:57:00
|
Duncan, my friend, so good to hear from you again! I'm glad pyparsing continues to be of some use to you. I must admit, you are the first I have heard of to be parsing Tibetan with pyparsing. I think I can propose a few alternative solutions for you. First of all, your immediate problem has to do with your use of 'max'. 'max = 1' means just that, 1 AND NO MORE! In your failing case, "sSmi", the leading 's' is followed by another 'S', which by definition of your init word is not allowed; you exceeded the maximum -> parser fail! Fortunately, the simplest remedy is to use the 'exact' argument instead of 'max': init = Word('sSbB', exact=1).setName("initial") med = Word('mMpP').setName("medial") vow = Word('aeiou', exact=1).setName("vowel") 'exact' does not impose the same lookahead restriction that 'max' does. If your test case is close enough to your Tibetan application, you might try one of these other options. You can merge your initial and medial expressions into a single word, since what you describe is exactly the same as the 2-argument constructor for word. Breaking out the definition of syllable as: syllable = Combine( init + ZeroOrMore(med) + Optional(vow) ) syllables = Group(OneOrMore(syllable)).setResultsName("syllables") The first two bits of your syllable can be merged into a single Word expression: syllable = Combine( Word('sSbB', 'mMpP') + Optional(vow) ) syllables = Group(OneOrMore(syllable)).setResultsName("syllables") Or if you can tolerate an even more liberal expression (which would match if vowels were mixed in with medials, and not just added to the end): syllable = Word('sSbB', 'mMpPaeiouAEIOU') This will parse fairly quickly as well, since it is able to internally convert this entire thing into the single regex "[sSbB][mMpPaeiouAEIOU]*". If you still need the more rigor of your original case (only a single potential vowel at the end of the syllable, not mixed in with medials), you might still try rolling your own Regex: syllable = Regex(r"[sSbB][mMpP]*[aeiou]?") I've found that for low-level tokens like words and numbers, using a Regex really outperforms "Combine(startWithThis + (somethingElse|anotherThing) + Optional(stillAnotherThing))"; while keeping the re's localized to just a simple building block pretty much keeps them from getting too out-of-hand. For instance, I've modified the fourFn.py example that ships with pyparsing to show the old style commented out, and a still-fairly-easy-to-follow-regex replacement: #~ fnumber = Combine( Word( "+-"+nums, nums ) + #~ Optional( point + Optional( Word( nums ) ) ) + #~ Optional( e + Word( "+-"+nums, nums ) ) ) fnumber = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?") ident = Word(alphas, alphas+nums+"_$") If these syllabic constructs in Tibetan can be built up from single Unicode characters, then I think all of these suggestions are still valid, even down to the Regex idea. I'd be very interested to see more of your Tibetan parser, as things progress - good luck! -- Paul |
From: Duncan M. <dun...@gm...> - 2010-10-10 15:51:15
|
Hey all! Paul, I think it's been since PyCon 2006 since we chatted last! I've been using pyparsing on and off since then for various projects. I've even used it at work for conceptual modeling (we're working on a gesture language for multi-touch in Ubuntu). However, I'm emailing for help about parsing syllabic rules in a natural language... Tibetan. I won't bore you with linguistic details, though: I've created a minimal example with a fake language below. Here are the rules: 1. All syllables must start with s, S, b, or B. 2. Syllables can be as short as one initial letter. 3. If there are additional consonants in the syllable, they must be one of m, M, p, or P. 4. Medial consonants may repeat multiple times. 5. Vowels are optional. Here was my first try at a grammar for these rules: init = Word('sSbB', max=1).setName("initial") med = Word('mMpP').setName("medial") vow = Word('aeiou', max=1).setName("vowel") syllables = Group(OneOrMore(Combine( init + ZeroOrMore(med) + Optional(vow) ))).setResultsName("syllables") For most cases, this resulted in the desired parsing: syllables.parseString("sabmaSMpo").asList() [['sa', 'bma', 'SMpo']] However, I discovered an edge case that wasn't covered. The following examples result in exceptions: syllables.parseString("sSma").asList() syllables.parseString("sbisi").asList() Now, if I change the init definition to the following: init = oneOf('S s b B').setName("initial") I get the desired results for everything. The two problem cases result in this: syllables.parseString("sSma").asList() [['s', 'Sma']] syllables.parseString("sbisi").asList() [['s', 'bi', 'si']] So it seems to me that Word should *somehow* be able to do this, though obviously my use of max=1 and the hope that this would do it is naive ;-) For the sake of consistency, I'd rather not have to join the list of initial characters with a space. Is there a way of accomplishing my goal with Word instead of oneOf? Thanks! d |
From: Peter J. <ta...@hi...> - 2010-08-31 17:00:52
|
Hi -- I'd like to use the deltaTime.py script from the examples directory of the 1.5.5 release in my own project, but I want to make sure of the license before I do so. It doesn't explicitly state a license within, only that it is copyright Paul McGuire. Is it also covered under the MIT license? Thanks. pete |
From: Diez B. R. <de...@we...> - 2010-06-10 12:57:00
|
On Thursday, June 10, 2010 13:27:11 Thomas Jensen wrote: > Dear PyParser Experts > > I am trying to scrape a lot of data from the European Parliament > website for a research project. The first step is to create a list of > all parliamentarians, however due to the many Eastern European names > and the accents they use i get a lot of missing entries. Here is an > example of what is giving me troubles (notice the accents at the end > of the family name): I would suggest you use BeatifulSoup for this instead of pyparsing. Pyparsing is great, but parsing HTML is a done thing, and to get it robust actually requires a *lot* of effort. Diez |
From: Thomas J. <tho...@eu...> - 2010-06-10 11:40:52
|
Dear PyParser Experts I am trying to scrape a lot of data from the European Parliament website for a research project. The first step is to create a list of all parliamentarians, however due to the many Eastern European names and the accents they use i get a lot of missing entries. Here is an example of what is giving me troubles (notice the accents at the end of the family name): <td class="listcontentlight_left"> <a href="/members/expert/alphaOrder/view.do? language=EN&id=28276" title="ANDRIKIENĖ, Laima Liucija">ANDRIKIENĖ, Laima Liucija</a> <br/> Group of the European People's Party (Christian Democrats) <br/> </td> Here is the url from which the html example is taken from: http://www.europarl.europa.eu/members/expert/alphaOrder.do?letter=B&language=EN So far I have been using PyParser and the following code (I know about hyphens and so forth this is just a test to see if I can get the name listed above): #parser_names name = Word(alphanums + alphas8bit) begin, end = map(Suppress, "><") names = begin + ZeroOrMore(name) + "," + ZeroOrMore(name) + end for name in names.searchString(page): print(name) However this does not catch the name from the html above. Any advice in how to proceed? Best, Thomas P.S: Here is all the code i have so far: # -*- coding: utf-8 -*- import urllib.request from pyparsing_py3 import * page = urllib.request.urlopen("http://www.europarl.europa.eu/members/expert/alphaOrder.do?letter=B&language=EN ") page = page.read().decode("utf8") #parser_names name = Word(alphanums + alphas8bit) begin, end = map(Suppress, "><") names = begin + ZeroOrMore(name) + "," + ZeroOrMore(name) + end for name in names.searchString(page): print(name) |
From: Kevin <kcc...@gl...> - 2010-04-19 18:06:25
|
Hi all: Been working with pyparsing for a few days now and need some help with the following I have a file which is read on startup to either create or create and populate classes The following would be a valid input file Class1::Instance1 Class1::Instance2 Class1::Instance3 { variable1 = TRUE variable2 = FALSE variable3 = { {"valuename1", 10}, {"valuename2", 20} } variable4 = { {10, "valuename3"} } variable5 = "variable5value" } Class1::Instance4 Class2::Instance1 etc, etc. My parser looks as follows def importFile_BNF(): global importBNF # if importBNF: # We've been here before. Don't do it again # return importBNF # Literals not to be stored LCB, RCB, COMMA, DQT = map(Suppress, '{},"') DCLN = Suppress(Literal('::')) comment = Suppress(Literal('#') + Optional(restOfLine)) assign = Suppress(Literal('=')) insert = Suppress(Literal('+=')) remove = Suppress(Literal ('-=')) operation = assign ^ insert ^ remove classIdentifier = Word(alphas , alphanums + '_') instanceIdentifier = Word(alphas, alphanums + '_-') name = DQT + instanceIdentifier + DQT order = Word(nums) driverEntry = LCB + name + order + RCB valueList = LCB + OneOrMore(driverEntry) + Optional(COMMA) + RCB statement = instanceIdentifier + assign + valueList identifier = Group(classIdentifier + DCLN + instanceIdentifier) stanzaBody = LCB + OneOrMore(statement) + RCB stanza = identifier + Optional(stanzaBody) stanza.setDebug() importBNF = ZeroOrMore(stanza) importBNF.setDefaultWhitespaceChars(' \t\r') importBNF.ignore(comment) importBNF.ignore(blankline) return importBNF What happens when I run this is that when I reach the LCB on Class1::Instance3 I get an exception that says "Expected W:(abcd.....) (at char XXX), (line:YY, col ZZ) Having the stanza body marked optional should allow the opening LCB to be parsed, but it appears that the parser is insisting that I only do 'definitions' and will not allow me to populate the class. Any help is greatly appreciated. Kevin |
From: Diez B. R. <de...@we...> - 2010-04-07 20:51:57
|
Hi, ok, I don't know why I didn't think of this the first place - maybe some weird "you are using pyparsing, no need to bother with nitty gritty regexes", but that's what helped - and should have been obvious to me :) escapes = r"\\\\|\\\." IDENT = Regex(r"([a-zA-Z_-]|(%(escapes)s))([a-zA-Z0-9_-]|(% (escapes)s))*" % dict(escapes=escapes)) I post this just for the record. Diez Am 07.04.2010 um 15:57 schrieb Diez B. Roggisch: > Hi, > > I somehow lost the mail by Denis, so I quote it by hand here, hope > that works: > >> (Not really sure about your intent.) > > My intent is to simply parse a string like this: > > div . class\.name > > as > > tag[div], class[class.name] > > instead of > > tag[div], class[class], class[name] > > For this to happen, I need to special-case escape-codes beginning > with \ so > that they are *not* treated as identifier followed by a dot, but > instead > always group the two characters "\." together. > >> You seem to be using pyparsing features rather strangely. >> The 'Word' pattern type allows defining distinct patterns for start >> and ( > >> optional) following characters. Both are character _classes_. You >> could use >> it like: > >> nameStartChar = ... >> nameFollowingChar = ... >> name = Word(nameStartChar,nameFollowingChar) > >> If you want to generalize name to include a dotted format, then >> rename the >> above to namePart and write a pattern including dots. > > > I'm not sure what you mean by this, nor if it helps me. I try to > come up with > a more concise example, here it is: > > from pyparsing import * > > nmstart = Word(srange(r"[\\_a-zA-Z]")) # |{nonascii}|{escape} > name = OneOrMore(Word(srange(r"[\\A-Z_a-z0-9]"))) # TODO: nonascii & > > ident = nmstart + ZeroOrMore(name) > > #ident = Word(srange(r"[_a-zA-Z]"), srange(r"[A-Z_a-z0-9]")) > > MINUS = Literal("-") > IDENT = Combine(Optional(MINUS) + ident, adjacent=True) # TODO > > DOT = Literal(".") > ASTERISK = Literal("*") > > class_ = Combine(DOT + IDENT) > element_name = IDENT | ASTERISK > > selector = (element_name + (ZeroOrMore( class_ )) | > OneOrMore( class_ )) > > > print selector.parseString(r"foo.bar") > print selector.parseString(r"foo.bar\baz") > print selector.parseString(r"foo.bar\.baz") > > > > The result is > > ['foo', '.bar'] > ['foo', '.bar\\baz'] > ['foo', '.bar\\', '.baz'] > > > So clearly the escaping isn't considering the second dot as part of > IDENT > instead of a DOT. And for this to happen, I need a specific lexer > rule like > quotedString - I guess. > > Diez > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |
From: Diez B. R. <de...@we...> - 2010-04-07 13:02:47
|
Hi, I somehow lost the mail by Denis, so I quote it by hand here, hope that works: > (Not really sure about your intent.) My intent is to simply parse a string like this: div . class\.name as tag[div], class[class.name] instead of tag[div], class[class], class[name] For this to happen, I need to special-case escape-codes beginning with \ so that they are *not* treated as identifier followed by a dot, but instead always group the two characters "\." together. > You seem to be using pyparsing features rather strangely. > The 'Word' pattern type allows defining distinct patterns for start and ( > > optional) following characters. Both are character _classes_. You could use > it like: > nameStartChar = ... > nameFollowingChar = ... > name = Word(nameStartChar,nameFollowingChar) > If you want to generalize name to include a dotted format, then rename the > above to namePart and write a pattern including dots. I'm not sure what you mean by this, nor if it helps me. I try to come up with a more concise example, here it is: from pyparsing import * nmstart = Word(srange(r"[\\_a-zA-Z]")) # |{nonascii}|{escape} name = OneOrMore(Word(srange(r"[\\A-Z_a-z0-9]"))) # TODO: nonascii & ident = nmstart + ZeroOrMore(name) #ident = Word(srange(r"[_a-zA-Z]"), srange(r"[A-Z_a-z0-9]")) MINUS = Literal("-") IDENT = Combine(Optional(MINUS) + ident, adjacent=True) # TODO DOT = Literal(".") ASTERISK = Literal("*") class_ = Combine(DOT + IDENT) element_name = IDENT | ASTERISK selector = (element_name + (ZeroOrMore( class_ )) | OneOrMore( class_ )) print selector.parseString(r"foo.bar") print selector.parseString(r"foo.bar\baz") print selector.parseString(r"foo.bar\.baz") The result is ['foo', '.bar'] ['foo', '.bar\\baz'] ['foo', '.bar\\', '.baz'] So clearly the escaping isn't considering the second dot as part of IDENT instead of a DOT. And for this to happen, I need a specific lexer rule like quotedString - I guess. Diez |
From: spir ☣ <den...@gm...> - 2010-04-06 10:47:34
|
On Fri, 2 Apr 2010 15:23:27 +0200 "Diez B. Roggisch" <de...@we...> wrote: > Hi, > > it seems as if the ML strips attachments, so here comes the > aforementioned example code inline: > > from pyparsing import * > > nmstart = Word(srange(r"[_a-zA-Z\\]")) # |{nonascii}|{escape} > name = OneOrMore(Word(srange(r"[A-Z_a-z0-9-\\]"))) # TODO: nonascii & > escape > #numlit = Word(srange("[0-9]")) > > MINUS = Literal("-") > IDENT = Combine(Optional(MINUS) + nmstart + ZeroOrMore(name), > adjacent=True) # TODO (Not really sure about your intent.) You seem to be using pyparsing features rather strangely. The 'Word' pattern type allows defining distinct patterns for start and (optional) following characters. Both are character _classes_. You could use it like: nameStartChar = ... nameFollowingChar = ... name = Word(nameStartChar,nameFollowingChar) If you want to generalize name to include a dotted format, then rename the above to namePart and write a pattern including dots. Denis ________________________________ vit esse estrany ☣ spir.wikidot.com |
From: Diez B. R. <de...@we...> - 2010-04-02 13:23:35
|
Hi, it seems as if the ML strips attachments, so here comes the aforementioned example code inline: from pyparsing import * nmstart = Word(srange(r"[_a-zA-Z\\]")) # |{nonascii}|{escape} name = OneOrMore(Word(srange(r"[A-Z_a-z0-9-\\]"))) # TODO: nonascii & escape #numlit = Word(srange("[0-9]")) MINUS = Literal("-") IDENT = Combine(Optional(MINUS) + nmstart + ZeroOrMore(name), adjacent=True) # TODO print IDENT.parseString(r"foo\bar") print IDENT.parseString(r"foo\.bar") The output is (cssprocessor)mac-dir:ablcssprocessor deets$ python /tmp/test.py ['foo\\bar'] ['foo\\'] So you can see there is the whole "\.bar"-stuff missing. Diez |
From: Diez B. R. <de...@we...> - 2010-04-02 13:12:38
|
Hi, I'm using pyparsing to parse CSS. Now I've encountered css-classes with dots in them the first time - so I need to extend my identifier definition to encompass backslash escapes. This test-program illustrates my problem. Or at least some problem, my real code throws for the same testcase an exception: ====================================================================== ERROR: Tests that various parts of the grammer parse ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/deets/projects/privat/TurboGears/ablcssprocessor/tests/ test_parser.py", line 42, in test_subexpressions (r'foo\.bar', [r"foo\.bar"]), File "/Users/deets/projects/privat/TurboGears/ablcssprocessor/tests/ test_parser.py", line 27, in parse result = g.parseString(test, True).asList() File "/Users/deets/.virtualenvs2.5/cssprocessor/lib/python2.5/site- packages/pyparsing-1.5.2-py2.5.egg/pyparsing.py", line 1076, in parseString raise exc ParseException: Expected end of text (at char 4), (line:1, col:5) ---------------------------------------------------------------------- Any suggestions how to deal with this? Thanks, Diez |
From: spir ☣ <den...@gm...> - 2010-03-31 09:37:03
|
On Tue, 30 Mar 2010 22:43:11 +0200 Eike Welk <eik...@gm...> wrote: > Hello Denis! > > On Tuesday March 30 2010 21:31:58 spir ☣ wrote: > > Hello, > > > > This time, that's me having a left-recursion issue. I'm trying to parse var > > names that can possibly refer to attributes, like "a.b.c". I can parse it > > as simpleName + ZeroOrMore(extension) > > but then I need to reformat the result resursively, to get the real > > semantics of: getattr(getattr(container, name), name) > > i.e. in the case of "a.b.c": > > ((a).b).c > > It is impossible to do what you want in an elegant way, you must reformat the > parse result. Right, I knew it in fact ;-) but was expecting kind of a miracle! > I had a similar question some time ago, this is the simplified > summary of Paul's answer. > > For code that gives you the right operator precedence look at Pyparsing's > operatorPrecedence(...) or at the calculator example. It also gives you a > parse result that can be relatively easily reformatted. > > By the way, if you really implement a programming language with Pyparsing use > operatorPrecedence(...) it can parse nearly all of Python's expressions > including function calls. [...] I don't, in fact (even may swith to Lua for several reasons). And there is no operator precedence in the language I'm parsing. For possible interest: As of now, operators map to class methods, meaning eg "+" does not map to self.add(other) but to Number.add(numbers) and this scheme may (as of now, it does) mirror in a prefix notation syntax: +(n1 n2 n3 ...) instead of n1 + n2 + n3 ... Thus there is no operator precedence at all. (Anyway, I wanted no arbitrary precedence between operators of the same "kind" (arithmetics, logical...), so I would have imposed parens if not used prefix notation.) I have always found operators mapping to instance methods simply wrong (less wrong for unary operators, but still). So, this scheme brings me both advantages at once. > If you are interested I can talk you though my still very big parser > implementation, which can be found here (look at line 979): > http://tinyurl.com/yc3gyqc > > Original URL, but Kmail messes this URL up: > http://bazaar.launchpad.net/~vcs- > imports/freeode/trunk/annotate/head:/freeode_py/freeode/simlparser.py I have had a look, thank you. But your app is really big ;-) I'm exploring it superficially, looks interesting and the implementation is really clean :-) * custom nodes * Just a side note you may find interesting, since you yield nodes of custom type, to represent chunks of source code, via parse actions: For my present app, I introduce a modif in my parsing library that allows specifying a class, instead of a func, as match (parse) action. In this case, instead of instantiating a standard node (parse result) from the match result, and then applying an action on it --that will possibly yield a custom node, as you do--, the matching method directly calls the class (with the same args as if it were a standard node), and returns it normally -- so that it will be inserted in the tree like any other node. Hem, seems I'm not clear, so the sketch is: class KindOfPattern(Pattern): ... def _match(source): # (the source holds its current pos, after matching it holds the range) result = <try and match, else MatchFailure exception> # result is either matched source snippet # or a sequence of child nodes action = self.action if isinstance(action, type): # case custom node = action(self, result, source) # needed transformation, if any, should be done in __init__ else: # case standard node = Node(self, result, source) # node applies possible action itself, if not None return node [A disadvantage is I cannot use anymore builtin match actions that reformat result nodes according to common needs (drop, extract sub result, join, flatten leaves, debug output...). So, I'll have to reintroduce the possibility to specify several actions (which I removed before, because one action can simply call others: "def action2(node): action1(node); ...").] Every custom node is regarded (from a higher-level node's point of view) as a single / simple node, meaning it becomes a leaf whatever it represents. But it may be useful that custom nodes _act_ like composite ones when they conceptually are, by implementing indexing/iteration, treeview, whatever... Don't know if Paul likes this option and would like to introduce it in pyparsing. An intermediate solution is to modify the lib itself so as to have only custom nodes ;-) I did this first, first by making Node a subtype of my top Data type (so that the door is open to homoiconicity). But this is not very practicle, since usually lower-level matches are well dealt with as standard nodes, before they become input data for higher-level ones. Denis ________________________________ vit esse estrany ☣ spir.wikidot.com |
From: Eike W. <eik...@gm...> - 2010-03-30 20:43:20
|
Hello Denis! On Tuesday March 30 2010 21:31:58 spir ☣ wrote: > Hello, > > This time, that's me having a left-recursion issue. I'm trying to parse var > names that can possibly refer to attributes, like "a.b.c". I can parse it > as simpleName + ZeroOrMore(extension) > but then I need to reformat the result resursively, to get the real > semantics of: getattr(getattr(container, name), name) > i.e. in the case of "a.b.c": > ((a).b).c It is impossible to do what you want in an elegant way, you must reformat the parse result. I had a similar question some time ago, this is the simplified summary of Paul's answer. For code that gives you the right operator precedence look at Pyparsing's operatorPrecedence(...) or at the calculator example. It also gives you a parse result that can be relatively easily reformatted. By the way, if you really implement a programming language with Pyparsing use operatorPrecedence(...) it can parse nearly all of Python's expressions including function calls. The only exception are the intertwined unary minus and exponentiation operators. You'll need hand crafted code for them. operatorPrecedence(...) has significantly reduced the size of my parser (I have 12 levels of precedence). If you are interested I can talk you though my still very big parser implementation, which can be found here (look at line 979): http://tinyurl.com/yc3gyqc Original URL, but Kmail messes this URL up: http://bazaar.launchpad.net/~vcs- imports/freeode/trunk/annotate/head:/freeode_py/freeode/simlparser.py Eike. |
From: spir ☣ <den...@gm...> - 2010-03-30 19:32:10
|
Hello, This time, that's me having a left-recursion issue. I'm trying to parse var names that can possibly refer to attributes, like "a.b.c". I can parse it as simpleName + ZeroOrMore(extension) but then I need to reformat the result resursively, to get the real semantics of: getattr(getattr(container, name), name) i.e. in the case of "a.b.c": ((a).b).c In PEG, I cannot find a way to avoid left-recursion: simpleName : [a-zA-Z_] [a-zA-Z_0-9]* attribute : name '.' simpleName name : attribute / simpleName and still get an expression of the recursive pattern directly. The issue is indeed that (unlike the way I wrote it above) the expression of an attribute is not nicely wrapped in delimiters. [I really look for that because in my case any name ends up mapping to getattr(world, name), ie it refers an attribute of <world>, the equivalent of in the language I'm parsing of py globals().] Is there any workaround? Thanks for reading, Denis ________________________________ vit esse estrany ☣ spir.wikidot.com |
From: Paul M. <pt...@au...> - 2010-03-28 06:41:56
|
I can see that you are just getting started with pyparsing. First of all, nums is not intended to be used as a parsing expression as you have used it. nums is a string defined in pyparsing to make it easy to create Word's made up of digits and other charaters. Pyparsing allows you to mix strings and expressions so that you can easily build up parsers using the '+' and '|' operators. For instance, to match a Python comment that starts with a '#' character and goes to the end of the line, you can use: comment = "#" + restOfLine The '+' operator in pyparsing will automatically promote strings to a pyparsing Literal, so this is synonymous with: comment = Literal("#") + restOfLine but I think that the first version is a little easier to read. You are use the string 'nums' that is defined by pyparsing, but I don't think the results are as you intended. Your statement: decimal_digits << (nums | (decimal_digits + nums)) is like saying: decimal_digits << ("0123456789" | (decimal_digits + "0123456789")) which will get expanded to: decimal_digits << (Literal("0123456789") | (decimal_digits + Literal("0123456789"))) Now your input string "876875.878" *does* contain numeric digits, it does not contain the exact sequence "0123456789", so that Literal will never match. So then pyparsing proceeds to the second alternative, which is (decimal_digits + Literal("0123456789")) So pyparsing recursively tries to match decimal_digits, which takes us back to the original expression, which fails, and so we recurse again,... and again and again until we hit the recursion limit. Let's start by correcting your implementation of a repetition of digits, to match the leading part of your test string. You can't just match the string defined by nums, since that will only match the literal sequence "0123456789". So you want to match *any one of* these characters, one at a time. Conveniently, pyparsing includes a helper method named oneOf that will take a list of strings, or a single string of space-separated strings. I suspect you are trying to follow a BNF definition for a real number, something like: decimal_digits ::= digit | decimal_digits digit This is a recursive definition that will recursively match a sequence of digits. To directly translate this to pyparsing would be similar to your attempt, but we'll correct it to use oneOf: decimal_digits = Forward() digit = oneOf(list(nums)) decimal_digits << (digit + decimal_digits | digit) This is a common experience when trying to take a classical BNF definition and convert it directly to pyparsing. Your definition uses recursion to successively match the leading digits of your input string. BNF does not have syntax for repetition, so to define something like a list of elements, one must use: list_of_items ::= item list_of_items | item In pyparsing, you can simply define list_of_items = OneOrMore(item) So we could replace our complicated decimal_digits definition above with just: digit = oneOf(list(nums)) decimal_digits = OneOrMore(digit) But even better would be to use pyparsing's Word class, for which nums was intended in the first place. decimal_digits = Word(nums) Word takes one or two strings to specify a sequence of characters. If just one string of possible characters is given, then Word will match as many characters in that set as possible, like decimal_digits. If two strings are given, the first will be used as the set of valid *initial* characters, and the second will be used as the set of valid *body* characters. So you could see something like: uppers = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" lowers = "abcdefghijklmnopqrstuvwxyz" capitalized_word = Word(uppers, lowers) To match your full string, you would need something like this: real_number = Combine(Optional("-") + Word(nums) + "." + Word(nums)) Please read some of the pyparsing documentation, and check out the examples on the pyparsing wiki. Welcome to pyparsing! -- Paul > -----Original Message----- > From: elekis [mailto:el...@gm...] > Sent: Saturday, March 27, 2010 12:30 PM > To: pyp...@li... > Subject: [Pyparsing] RuntimeError: maximum recursion depth exceeded > > hi, > > I have a error that I can't find the solution. > > I have the following recursive rule > > decimal_digits = Forward() > decimal_digits << (nums | (decimal_digits + nums)) > p = decimal_digits.parseString("876875.878") > > but I have a runtimeerro maximum recursion depth exceeded. > > dunno why . > > any idea ? > > thanks > > a++ > > -- > http://twoji.deviantart.com/ > http://www.flickr.com/elekis > |
From: elekis <el...@gm...> - 2010-03-27 17:30:36
|
hi, I have a error that I can't find the solution. I have the following recursive rule decimal_digits = Forward() decimal_digits << (nums | (decimal_digits + nums)) p = decimal_digits.parseString("876875.878") but I have a runtimeerro maximum recursion depth exceeded. dunno why . any idea ? thanks a++ -- http://twoji.deviantart.com/ http://www.flickr.com/elekis |
From: ThanhVu (V. N. <ngu...@gm...> - 2010-03-25 22:07:46
|
thanks for the explanation -- I think your code works pretty well VN - 2010/3/25 spir ☣ <den...@gm...>: > On Wed, 24 Mar 2010 15:57:34 -0600 > "ThanhVu (Vu) Nguyen" <ngu...@gm...> wrote: > >> Hi, I tried to generate this simple recursive rule that involves both >> Forward and operatorPrecedence() and get errors about maximum >> recursion depth exceeded . >> >> Thanks, >> >> >> >> def getRule_test(): >> #rule exp = name | num | name[exp] | exp + exp | exp * exp | >> name = Word(alphas) >> num = Word(nums) >> >> exp = Forward() >> idx=name + '[' + exp + ']' >> >> arith = operatorPrecedence( >> exp,[('*',2,opAssoc.LEFT), >> ('+',2,opAssoc.RIGHT)],) >> >> exp << (arith|idx|name|num) #works ok if take out arith >> return exp >> >> VN - > > Yes, the recursive term "exp" in your format appears on the left side of arith, and thus finally on the left side of itself. This cannot cannot be matched since it lauches an infinite recursive loop of call to exp.match(). > > More generally, you cannot write a pattern such as: > p1 : p1 whatever > But it can always be reformulated into something like: > p2 : whatever p2 > > Here, you need to distinguish between the levels of a non-recursive operand (inside arith) and of a whole exp. Operator precedence already involves the inherent recursivity of operations. Exp only needs be recursive because it appears inside idx. (I guess.) > > Something like, maybe (untested!): > > #rule exp = name | num | name[exp] | exp + exp | exp * exp | > name = Word(alphas) > num = Word(nums) > > exp = Forward() > idx=name + '[' + exp + ']' > > operand = (idx|name|num) > arith = operatorPrecedence( > operand,[('*',2,opAssoc.LEFT), > ('+',2,opAssoc.RIGHT)],) > > exp << (arith|operand) #works ok if take out arith > return exp > > Denis > ________________________________ > > vit esse estrany ☣ > > spir.wikidot.com > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |