pyparsing-users Mailing List for Python parsing module (Page 18)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Eike W. <eik...@gm...> - 2008-09-06 20:11:53
|
The parser generated by the function "operatorPrecedence" can not correctly parse the power operator in conjunction with the unary minus operator. The Python reference says: "The power operator binds more tightly than unary operators on its left; it binds less tightly than unary operators on its right." [http://docs.python.org/ref/power.html] I think this can't be done with operatorPrecedence. Is there a more elegant (shorter) way to get the correct syntax than what I did in this example program here? http://pastebin.com/f7eb6be8a Maybe a parser that gets this right could be added to operatorPrecedence. The syntax could look like this for example: expression = operatorPrecedence( u_expr, [OpPower(power='**', sign=oneOf('+ -')), (oneOf('* /'), 2, opAssoc.LEFT), (oneOf('+ -'), 2, opAssoc.LEFT), ]) I think an example like the little program I've posted here, should be included in the examples, so that newbies get mathematical expressions right. For me mathematical expressions are by far the most tricky part in writing the parser for my toy language. Kind regards, Eike |
From: Chiraj <som...@gm...> - 2008-09-05 15:37:09
|
Hi, I'm fairly new to the pyparsing and the mailing list, but I have browsed through the email archives but I can't seem find how to parse statements in any order. #### Start Code #### from pyparsing import * Ident = Word(alphas, alphanums+"_") type_name = Ident.setResultsName("name") type = Group(Suppress(Keyword("type", caseless=True))+ type_name).setResultsName("type") types = Group(ZeroOrMore(type)).setResultsName("types") type2_name = Ident.setResultsName("name") type2 = Group(Suppress(Keyword("type2", caseless=True))+ type_name).setResultsName("type2") types2 = Group(ZeroOrMore(type2)).setResultsName("types2") content = Each([types, types2]) grammar = And([content]).setResultsName("content") parser = grammar.parseString( testString ) parser.asXML() #### It successfully parses the following input: type IDL type ODL type2 Ika type2 Nfkb and produces <content> <types> <type> <name>IDL</name> <type> <type> <name>ODL</name> <type> </types> <types2> <type2> <name>IDL</name> <type2> <type2> <name>Nfkb</name> <type2> </types2> </content> But does not parse correctly the next input, which is basically the types swapped over: type2 Ika type2 Nfkb type IDL type ODL It produces the following without errors; It misses the second set of types: <content> <types> </types> <types2> <type2> <name>IDL</name> <type2> <type2> <name>Nfkb</name> <type2> </types2> </content> Ideally I'd like the parser to pick up both sets of types and place them in their respective collection(namely types and types): e.g it should be able to process the second input and the following and be able to produce the first output or similar: type2 Ika type IDL type2 Nfkb type ODL Any help would greatly be appreciated. Regards Chiraj |
From: Paul M. <pt...@au...> - 2008-09-04 22:12:36
|
> Ciao > Andreas > ps: I've just started using pyparsing, yesterday. And today, I have a running > parser for a file containing geometrical data. That's nice. I like pyparsing. ------------------ Fantastico - that's great! BTW, this concept of a Forward() expression that gets dynamically defined at parse time is *not* a typical beginning usage of pyparsing - many people never find the need for it at all - so no need to be too hard on yourself. :) -- Paul |
From: Andreas M. <and...@gm...> - 2008-09-04 21:25:11
|
Paul McGuire wrote: > Pyparsing includes a built-in helper method, called countedArray. Oh, there it is. Thanks a lot. > Look at the pyparsing source code to see how countedArray uses a Forward > expression that gets its content defined within a parse action attached > to the leading integer. Yes, that's what I was looking for. I have no idea why I have missed forward declarations in the documentation. It's all there. Ciao Andreas ps: I've just started using pyparsing, yesterday. And today, I have a running parser for a file containing geometrical data. That's nice. I like pyparsing. |
From: Paul M. <pt...@au...> - 2008-09-04 01:26:25
|
Andreas - Pyparsing includes a built-in helper method, called countedArray. Here is how you use it: from pyparsing import * counted_list_of_words = countedArray(Word(alphas)) tests = """\ 1 foo 2 foo baz 3 foo bar baz""".splitlines() for t in tests: print counted_list_of_words.parseString(t)[0] Prints: ['foo'] ['foo', 'baz'] ['foo', 'bar', 'baz'] countedArray(expr) matches Word(nums) + expr + expr + ... ("n" times, where n is the leading integer). Look at the pyparsing source code to see how countedArray uses a Forward expression that gets its content defined within a parse action attached to the leading integer. -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Andreas Matthias Sent: Wednesday, September 03, 2008 4:16 PM To: pyp...@li... Subject: [Pyparsing] Reading n words Hello, I'm trying to parse a string like '3 foo bar baz', where the number in the string specifies the number of the following words. Is there a more elegant way to achieve this than with the following code? Ciao Andreas from pyparsing import * def foo(st): p1 = Word(nums) n = p1.parseString(st)[0] p2 = Literal(n) + Word(alphas)*int(n) print p2.parseString(st) foo('1 foo') foo('2 foo baz') foo('3 foo bar baz') ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Andreas M. <and...@gm...> - 2008-09-03 21:24:55
|
Hello, I'm trying to parse a string like '3 foo bar baz', where the number in the string specifies the number of the following words. Is there a more elegant way to achieve this than with the following code? Ciao Andreas from pyparsing import * def foo(st): p1 = Word(nums) n = p1.parseString(st)[0] p2 = Literal(n) + Word(alphas)*int(n) print p2.parseString(st) foo('1 foo') foo('2 foo baz') foo('3 foo bar baz') |
From: Gre7g L. <haf...@ya...> - 2008-08-28 05:42:48
|
alphas indicates you are only looking for letters, so alphas will only match the G in G09. Try alphanums instead. Gre7g ----- Original Message ---- From: Norbert Klamann <Nor...@pr...> To: pyp...@li... Sent: Wednesday, August 27, 2008 7:17:12 AM Subject: [Pyparsing] Stumbling at the first step Hello all, I have a presumably trivial problem which vexes me nonetheless: I try: from pyparsing import * Word( alphas ).parseString("G09 " ) and get (['G'], {}) instead of ['G09'] I really do not understand this and I am sure that I miss something obvios. The version of pyparsing is the newest Windows install. Thanks for listening Norbert ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Norbert K. <Nor...@pr...> - 2008-08-28 05:29:57
|
Hello all, I have a presumably trivial problem which vexes me nonetheless: I try: from pyparsing import * Word( alphas ).parseString("G09 " ) and get (['G'], {}) instead of ['G09'] I really do not understand this and I am sure that I miss something obvios. The version of pyparsing is the newest Windows install. Thanks for listening Norbert |
From: Andrew W. <and...@gm...> - 2008-07-08 20:15:20
|
Paul McGuire wrote: > As Gre7g L already posted, enabling packrat parsing can be a big boost > (~1000X) to performance when using operatorPrecedence. > > I just uploaded an example of parsing RE's > (http://pyparsing.wikispaces.com/space/showimage/invRegex.py) that I thought > was already on the Pyparsing wiki, so I'm sorry for not posting this sooner. > This particular example is an RE inverter, returning a generator of all > strings that would match the given RE (note: does not allow arbitrary > repetition operators such as '*' or '+', otherwise it would just blow up), > but it also uses opPrec. Parsing "(foo(bar))" takes about 1/4 of a second - > still a long time, but I hope not ridiculously so. Maybe this example might > shed some light on some alternative approaches to this problem. > That is what I used as a base for my parser (it needed a lot of modifcation, since this matching language is more like a hybrid of globs and regexps). It was already on the wiki. Using a newer version of Python fixes the problem. I had actually tried enabled packrat parsing before, and it didn't seem to do anything. There seems to be some kind of bug that causes packrat parsing to break on Python 2.3 (it actually takes longer if it is enabled on 2.3). 2.4 and later don't seem to be affected. |
From: Paul M. <pt...@au...> - 2008-07-08 08:18:26
|
As Gre7g L already posted, enabling packrat parsing can be a big boost (~1000X) to performance when using operatorPrecedence. I just uploaded an example of parsing RE's (http://pyparsing.wikispaces.com/space/showimage/invRegex.py) that I thought was already on the Pyparsing wiki, so I'm sorry for not posting this sooner. This particular example is an RE inverter, returning a generator of all strings that would match the given RE (note: does not allow arbitrary repetition operators such as '*' or '+', otherwise it would just blow up), but it also uses opPrec. Parsing "(foo(bar))" takes about 1/4 of a second - still a long time, but I hope not ridiculously so. Maybe this example might shed some light on some alternative approaches to this problem. -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Andrew Warkentin Sent: Friday, July 04, 2008 10:43 PM To: pyp...@li... Subject: [Pyparsing] parsing an operator-precedence-based grammar withreasonable performance? I am trying to write a parser for a regexp-like matching language using pyparsing. It can be implemented in terms of operator precedence. I tried using the operatorPrecedence function, but the performance is unacceptable for expressions containing nested parentheses. Parsing time appears to increase exponentially with the number of nested parentheses, and the parsing time for each set of nested parentheses appears to increase exponentially with the number of operators in the grammar. Even parsing something as simple as "(foo(bar))" with a grammar of 4 binary and 2 unary operators takes ridiculously long (about 2 minutes). Is there any way to parse operator-precedence-based grammars using pypasing that doesn't increase exponentially in run time with the number of nested parentheses? ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Gre7g L. <haf...@ya...> - 2008-07-06 16:32:09
|
You'll need to enable Packrat. It makes a huge difference, especially with operatorPrecedence. Gre7g ----- Original Message ---- From: Andrew Warkentin <and...@gm...> To: pyp...@li... Sent: Friday, July 4, 2008 9:43:10 PM Subject: [Pyparsing] parsing an operator-precedence-based grammar with reasonable performance? I am trying to write a parser for a regexp-like matching language using pyparsing. It can be implemented in terms of operator precedence. I tried using the operatorPrecedence function, but the performance is unacceptable for expressions containing nested parentheses. Parsing time appears to increase exponentially with the number of nested parentheses, and the parsing time for each set of nested parentheses appears to increase exponentially with the number of operators in the grammar. Even parsing something as simple as "(foo(bar))" with a grammar of 4 binary and 2 unary operators takes ridiculously long (about 2 minutes). Is there any way to parse operator-precedence-based grammars using pypasing that doesn't increase exponentially in run time with the number of nested parentheses? ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Andrew W. <and...@gm...> - 2008-07-05 03:43:32
|
I am trying to write a parser for a regexp-like matching language using pyparsing. It can be implemented in terms of operator precedence. I tried using the operatorPrecedence function, but the performance is unacceptable for expressions containing nested parentheses. Parsing time appears to increase exponentially with the number of nested parentheses, and the parsing time for each set of nested parentheses appears to increase exponentially with the number of operators in the grammar. Even parsing something as simple as "(foo(bar))" with a grammar of 4 binary and 2 unary operators takes ridiculously long (about 2 minutes). Is there any way to parse operator-precedence-based grammars using pypasing that doesn't increase exponentially in run time with the number of nested parentheses? |
From: Stefaan H. <ste...@gm...> - 2008-07-03 15:08:05
|
Hello Paul, and thank you for the swift reply! Because this problem was rather urgent I had already reworked the grammar to avoid using keepOriginalText, and that solved the problem in my case (but I also had to modify the client program that uses the parser...) Your alternative to keepOriginalText looks interesting, and I will keep it in mind to play with the next time I am trying to parse something using pyparsing (which is quite often these days ;) ) Best regards, and thanks again, Stefaan. On Thu, Jul 3, 2008 at 2:58 PM, Paul McGuire <pt...@au...> wrote: > Stefaan - > > I suspect the problem is with the inspect module usage, don't know how to > work around that per se. > > However, I did come up with something interesting. I defined an Empty() > with a parse action that just returns the current parse location. I then > bracketed your remove_lines expression with one of these at the beginning > and at the end, with appropriate results names. Then instead of attaching > keepOriginalText, I attached a simple lambda that returns the slice of the > input string from the given begin and end values. See below: > > # a dummy expression that just returns the current parse location > get_cur_locn = p.Empty().setParseAction(lambda s,l,t: l) > > remove_line = p.lineStart + p.Literal("<").suppress() + \ > p.restOfLine.setResultsName("LineContents") + p.lineEnd > > # put get_cur_locn exprs at front and back of remove_lines, with > # useful results names - (Combine is not really necessary, I don't think) > remove_lines = get_cur_locn("begin") + \ > p.OneOrMore(remove_line) + \ > get_cur_locn("end") > > # now replace keepOriginalText with simple string slice - should be > # faster, too! > remove_lines.setParseAction(lambda s,l,t:s[t.begin:t.end]) > > > HTH, > -- Paul > > > > -----Original Message----- > From: pyp...@li... > [mailto:pyp...@li...] On Behalf Of > stefaan > Sent: Thursday, July 03, 2008 6:20 AM > To: pyp...@li... > Subject: [Pyparsing] keepOriginalText problem > > Hello, > > I have a problem with keepOriginalText: when I try to parse a grammar using > the > keepOriginalText parse action inside a cherrypy 2.2.1-based web > application, > the > getTokensEndLoc() fails and I get a traceback: > > File "C:\shi\webapp\comparefiles\comparefiles.py", line 273, in doCompare > NormalDiffParser.remove_lines.parseString("< line 1" + os.linesep + "< > line > 2" + os.linesep) > File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 981, in > parseString > loc, tokens = self._parse( instring, 0 ) > File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 886, in > _parseNoCache > tokens = fn( instring, tokensStart, retTokens ) > File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 3156, in > keepOriginalText > endloc = getTokensEndLoc() > File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 3167, in > getTokensEndLoc > fstack = inspect.stack() > File "D:\Python24\lib\inspect.py", line 819, in stack > return getouterframes(sys._getframe(1), context) > File "D:\Python24\lib\inspect.py", line 800, in getouterframes > framelist.append((frame,) + getframeinfo(frame, context)) > File "D:\Python24\lib\inspect.py", line 769, in getframeinfo > raise TypeError('arg is not a frame or traceback object') > TypeError: arg is not a frame or traceback object > > The grammar for remove_lines: > > import pyparsing as > remove_line = p.lineStart + p.Literal("<").suppress() + \ > p.restOfLine.setResultsName("LineContents") + p.lineEnd > remove_lines = p.Combine(p.OneOrMore(remove_line)) > remove_lines.setParseAction(p.keepOriginalText) > > > The same program, with the same input, works flawlessly in a "normal" (i.e. > not > webapplication) script. > > > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > |
From: Paul M. <pt...@au...> - 2008-07-03 12:58:56
|
Stefaan - I suspect the problem is with the inspect module usage, don't know how to work around that per se. However, I did come up with something interesting. I defined an Empty() with a parse action that just returns the current parse location. I then bracketed your remove_lines expression with one of these at the beginning and at the end, with appropriate results names. Then instead of attaching keepOriginalText, I attached a simple lambda that returns the slice of the input string from the given begin and end values. See below: # a dummy expression that just returns the current parse location get_cur_locn = p.Empty().setParseAction(lambda s,l,t: l) remove_line = p.lineStart + p.Literal("<").suppress() + \ p.restOfLine.setResultsName("LineContents") + p.lineEnd # put get_cur_locn exprs at front and back of remove_lines, with # useful results names - (Combine is not really necessary, I don't think) remove_lines = get_cur_locn("begin") + \ p.OneOrMore(remove_line) + \ get_cur_locn("end") # now replace keepOriginalText with simple string slice - should be # faster, too! remove_lines.setParseAction(lambda s,l,t:s[t.begin:t.end]) HTH, -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of stefaan Sent: Thursday, July 03, 2008 6:20 AM To: pyp...@li... Subject: [Pyparsing] keepOriginalText problem Hello, I have a problem with keepOriginalText: when I try to parse a grammar using the keepOriginalText parse action inside a cherrypy 2.2.1-based web application, the getTokensEndLoc() fails and I get a traceback: File "C:\shi\webapp\comparefiles\comparefiles.py", line 273, in doCompare NormalDiffParser.remove_lines.parseString("< line 1" + os.linesep + "< line 2" + os.linesep) File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 981, in parseString loc, tokens = self._parse( instring, 0 ) File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 886, in _parseNoCache tokens = fn( instring, tokensStart, retTokens ) File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 3156, in keepOriginalText endloc = getTokensEndLoc() File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 3167, in getTokensEndLoc fstack = inspect.stack() File "D:\Python24\lib\inspect.py", line 819, in stack return getouterframes(sys._getframe(1), context) File "D:\Python24\lib\inspect.py", line 800, in getouterframes framelist.append((frame,) + getframeinfo(frame, context)) File "D:\Python24\lib\inspect.py", line 769, in getframeinfo raise TypeError('arg is not a frame or traceback object') TypeError: arg is not a frame or traceback object The grammar for remove_lines: import pyparsing as remove_line = p.lineStart + p.Literal("<").suppress() + \ p.restOfLine.setResultsName("LineContents") + p.lineEnd remove_lines = p.Combine(p.OneOrMore(remove_line)) remove_lines.setParseAction(p.keepOriginalText) The same program, with the same input, works flawlessly in a "normal" (i.e. not webapplication) script. ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: stefaan <Ste...@gm...> - 2008-07-03 11:20:05
|
Hello, I have a problem with keepOriginalText: when I try to parse a grammar using the keepOriginalText parse action inside a cherrypy 2.2.1-based web application, the getTokensEndLoc() fails and I get a traceback: File "C:\shi\webapp\comparefiles\comparefiles.py", line 273, in doCompare NormalDiffParser.remove_lines.parseString("< line 1" + os.linesep + "< line 2" + os.linesep) File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 981, in parseString loc, tokens = self._parse( instring, 0 ) File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 886, in _parseNoCache tokens = fn( instring, tokensStart, retTokens ) File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 3156, in keepOriginalText endloc = getTokensEndLoc() File "C:\shi\webapp\comparefiles\diffparser\pyparsing.py", line 3167, in getTokensEndLoc fstack = inspect.stack() File "D:\Python24\lib\inspect.py", line 819, in stack return getouterframes(sys._getframe(1), context) File "D:\Python24\lib\inspect.py", line 800, in getouterframes framelist.append((frame,) + getframeinfo(frame, context)) File "D:\Python24\lib\inspect.py", line 769, in getframeinfo raise TypeError('arg is not a frame or traceback object') TypeError: arg is not a frame or traceback object The grammar for remove_lines: import pyparsing as remove_line = p.lineStart + p.Literal("<").suppress() + \ p.restOfLine.setResultsName("LineContents") + p.lineEnd remove_lines = p.Combine(p.OneOrMore(remove_line)) remove_lines.setParseAction(p.keepOriginalText) The same program, with the same input, works flawlessly in a "normal" (i.e. not webapplication) script. |
From: Stefaan H. <ste...@gm...> - 2008-06-25 08:07:29
|
Hello Paul, Thanks a lot! I indeed had understood that ## caused problems in combination with the implicit whitespace parsing and I seriously doubt I could have come up with the full solution myself... As for mixing ^ and + -- I actually knew the difference -- but I was getting tired. (In maths and electronics, + usually means OR which probably explains my confusion ;) ) It seems to work now! (Well this part of my parsing problem at least -- but the I will first try to continue myself) Best regards, and thanks again, Stefaan. |
From: Paul M. <pt...@au...> - 2008-06-25 01:01:38
|
Stefaan - First off, '^' and '+' are not really interchangeable - '+' is used to indicate a succession of expressions that must occur in the given order. '^' indicates a list of alternatives, and that the parser should evaluate all of the alternatives and select the longest match. '|' is like '^', but short-cuts evaluation, stopping when the first alternative match is found. So replacing '+' with '^' will just make things worse. Secondly, Word("string of whitespace characters") does not work, and I should think would give you a compiler warning. If you absolutely *must* parse for whitespace, use the pyparsing White() class. (But read on - you don't really need White().) Overall, this *is* a mysterious parser, because you have a *lot* going on! Here was your expression for a list of columns the last time we spoke: list_of_cols = p.delimitedList(p.Regex(r"[^#\n\r]+"), "#") And here is a sample table: table = """ # NAME # col1 # col2 # col3 ## cola # colb # # Test1 # 1 # 2 # 3 ## a # b # # Test_2 # 4 # 5 # 6 ## c # d # You now want to add "optionality" to the entries in the table, so I've added another row with some blank cells: table = """ # NAME # col1 # col2 # col3 ## cola # colb # # Test1 # 1 # 2 # 3 ## a # b # # Test_2 # 4 # 5 # 6 ## c # d # # Test_3 # 7 # 8 # ## # e # """ My first pass was to modify the elements of the delimited list, to indicate that list elements could be blank - up til now, this was easily done by wrapping the expression in a pyparsing Optional: list_of_cols = p.delimitedList(p.Optional(p.Regex(r"[^#\n\r]+")), "#") But this results in the exception: pyparsing.ParseException: Expected "##" (at char 231), (line:6, col:5) Why? Because now, your "##" table separator is being interpreted as two column separators with an empty cell. So we need to expand our notion of a delimiter, that we *only* want to accept '#' delimiters after first determining that the '#' is not the first character of a '##' table separator: list_of_cols = p.delimitedList(p.Optional(p.Regex(r"[^#\n\r]+")), ~p.Literal("##")+"#") This now parses our table, but we lose track of the empty cells. I assume that the cell's presences is significant, so we add a default value to the definition of the Optional: list_of_cols = p.delimitedList(p.Optional(p.Regex(r"[^#\n\r]+"),default=""), ~p.Literal("##")+"#") We are also not properly handling the newlines, since p.Optional is skipping over them as its default whitespace-skipping behavior. So let's use another negated lookahead to prevent matching a LineEnd() as part of the content of the delimited list: list_of_cols = p.delimitedList(~p.LineEnd()+p.Optional(p.Regex(r"[^#\n\r]+"),default=""), ~p.Literal("##")+"#") This probably is enough for you to proceed. As a matter of style, I tend to group lists of things using a pyparsing Group: list_of_cols = p.Group(p.delimitedList(~p.LineEnd()+p.Optional(p.Regex(r"[^#\n\r]+"),defaul t=""), ~p.Literal("##")+"#")) Tables of data aren't ordinarily this complicated to parse - it's just that in this case that you've chosen/been given, there are some tricky stumbling blocks due to the nature of your delimiting punctuation. -- Paul |
From: stefaan.himpe <ste...@gm...> - 2008-06-24 21:19:31
|
Hello, and I have one follow-up question, The solution I had was able to parse tables with empty cells, but after incorporating your suggestions this no longer works. At first I expected it would be trivial to extend the parser to handle empty cells, but so far I haven't managed to get something working :( I have tried to extend the list_of_cols definition in many ways and I have tried to replace + with ^ in some places... I am still missing some fundamental insights in how pyparsing works to unravel this little mystery. I'd be really grateful for some input. Best regards, Stefaan. to given an idea of some of many attempts (replace list_of_cols in the earlier posted code) list_of_cols = p.delimitedList(p.Word(" \t") | p.Regex(r"[^#\n\r]+"), "#") or list_of_cols = p.delimitedList(p.Regex(r"[^#\n\r]*"), "#") |
From: Stefaan H. <ste...@gm...> - 2008-06-23 09:12:00
|
> > > Alternatively, you could also try tightening up your definition of > list_of_cols, too, to match just integers on the left side of the table, > and > contiguous alphanumeric words on the right side of the table. > > Hello, and thank you so much for your clarification. Tightening up the definition of the list_of_cols is not an option, however, as in the real-life application the cells can contain random characters/numbers/whitespace/... (templates for code generation). |
From: Paul M. <pt...@au...> - 2008-06-23 05:44:13
|
Stefaan - You are correct, this has to do with whitespace skipping in pyparsing. The culprit turns out to be your loose definition of list_of_cols: list_of_cols = p.delimitedList(p.CharsNotIn("#\n\r"), "#") Pyparsing defaults *in most cases* to skipping whitespace before trying to match any expression. Whitespace skipping gets suppressed if you have wrapped code within a Combine, or have called leaveWhitespace, *OR* if you use CharsNotIn. CharsNotIn started out as a sort of AntiWord, in that you could define a Word composed of any characters *not* in the given set. When I created CharsNotIn, I decided that I would *not* automatically skip whitespace before matching one of these, since whitespace could conceivably be one of the the characters to be avoided, and if I skipped over it before matching, I would make a false positive. One alternative is to add "Empty()" (or the pyparsing constant "empty") to your expression of what can be found in a list of cols, as in the following: list_of_cols = p.delimitedList(p.empty+p.CharsNotIn("#\n\r"), "#") Empty() *does* advance past whitespace, consumes no actual characters, and always succeeds, so adding Empty() is a way to explicitly jump over some whitespace. Or you could use the Regex expression, also which skips over whitespace before matching, and use the re notation of "[^...]" replacing '...' with the characters to exclude from matching: list_of_cols = p.delimitedList(p.Regex(r"[^#\n\r]+"), "#") With the sample you sent, either of the options works, choose whichever you are more comfortable with. Alternatively, you could also try tightening up your definition of list_of_cols, too, to match just integers on the left side of the table, and contiguous alphanumeric words on the right side of the table. Best of luck, and keep on pyparsing! -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of stefaan.himpe Sent: Saturday, June 21, 2008 10:09 AM To: pyp...@li... Subject: [Pyparsing] whitespace related question Hello list, I am stumped by some unexpected behaviour. I want to parse tables of the following form: table = """ # NAME # col1 # col2 # col3 ## cola # colb # # Test1 # 1 # 2 # 3 ## a # b # # Test_2 # 4 # 5 # 6 ## c # d # """ For this, I have specified a TableParser (code follows after this mail). At first sight, the TableParser does exactly what I want. But I found out that parsing stops if one of the table rows contains a space after the last "#", and I do not understand why. I expected the p.restOfLine to take care of this. This is with pyparsing 1.4.12. Any ideas? Best regards, Stefaan. import pyparsing as p identifier = p.Word(p.alphas + "_", p.alphas + p.nums + "_") col = p.Literal("#").suppress() list_of_cols = p.delimitedList(p.CharsNotIn("#\n\r"), "#") left_table_header = col + p.ZeroOrMore(identifier).setResultsName("TestColumnName") + col + \ list_of_cols.setResultsName("HeaderSetupDataColumns") right_table_header = list_of_cols.setResultsName("HeaderCheckDataColumns") + \ p.restOfLine.suppress() table_header = left_table_header.setResultsName("LeftTableHeader") + \ p.Literal("##").suppress() + \ right_table_header.setResultsName("RightTableHeader") + \ p.lineEnd.suppress() left_table_row = col + \ identifier.setResultsName("TestName") + \ col + \ list_of_cols.setResultsName("RowSetupDataColumns") right_table_row = list_of_cols.setResultsName("RowCheckDataColumns") + \ p.restOfLine.suppress() table_row = left_table_row.setResultsName("LeftTableRow") + \ p.Literal("##").suppress() + \ right_table_row.setResultsName("RightTableRow") + \ p.lineEnd.suppress() TableParser = table_header + \ p.OneOrMore(p.Group(table_row)).setResultsName("Rows") ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: stefaan.himpe <ste...@gm...> - 2008-06-22 05:06:47
|
Hello list, I am stumped by some unexpected behaviour. I want to parse tables of the following form: table = """ # NAME # col1 # col2 # col3 ## cola # colb # # Test1 # 1 # 2 # 3 ## a # b # # Test_2 # 4 # 5 # 6 ## c # d # """ For this, I have specified a TableParser (code follows after this mail). At first sight, the TableParser does exactly what I want. But I found out that parsing stops if one of the table rows contains a space after the last "#", and I do not understand why. I expected the p.restOfLine to take care of this. This is with pyparsing 1.4.12. Any ideas? Best regards, Stefaan. import pyparsing as p identifier = p.Word(p.alphas + "_", p.alphas + p.nums + "_") col = p.Literal("#").suppress() list_of_cols = p.delimitedList(p.CharsNotIn("#\n\r"), "#") left_table_header = col + p.ZeroOrMore(identifier).setResultsName("TestColumnName") + col + \ list_of_cols.setResultsName("HeaderSetupDataColumns") right_table_header = list_of_cols.setResultsName("HeaderCheckDataColumns") + \ p.restOfLine.suppress() table_header = left_table_header.setResultsName("LeftTableHeader") + \ p.Literal("##").suppress() + \ right_table_header.setResultsName("RightTableHeader") + \ p.lineEnd.suppress() left_table_row = col + \ identifier.setResultsName("TestName") + \ col + \ list_of_cols.setResultsName("RowSetupDataColumns") right_table_row = list_of_cols.setResultsName("RowCheckDataColumns") + \ p.restOfLine.suppress() table_row = left_table_row.setResultsName("LeftTableRow") + \ p.Literal("##").suppress() + \ right_table_row.setResultsName("RightTableRow") + \ p.lineEnd.suppress() TableParser = table_header + \ p.OneOrMore(p.Group(table_row)).setResultsName("Rows") |
From: Paul M. <pt...@au...> - 2008-06-06 00:21:34
|
Ken - Nice catch, and you were in the right *general* vicinity, but ultimately, name was not where the problem was. In fact, the problem was with type_. Here are the original definitions: TYPES = "Street St Boulevard Blvd Lane Ln Road Rd Avenue Ave " \ "Circle Cir Cove Cv Drive Dr Parkway Pkwy Court Ct" type_ = Combine( oneOf(TYPES, caseless=True) + Optional(".").suppress()) name = ~numberSuffix + Word(alphas) In parsing the street name, it is used like this: streetName = ( <blah blah... numbered street definition> | Combine(OneOrMore(~type_ + name), joinString=" ",adjacent=False) ) That is, the street name is built up of one or more names, stopping when we reach a type_. You correctly found this to be a problem if the street name was "Main Drag", but I was confused why this would fail, but another test in which the name was "Deer Run" succeeded. The problem was pyparsing's implicit whitespace skipping, or rather *not* skipping in this case. "Drag" begins with "Dr", which matches one of the defined TYPES, and so the matching of words to compose streetName stops after reading "Main", assuming that the leading "Dr" of "Drag" is the street type "Dr". To illustrate other possible problem names, I added these tests: >>> p("100 Integrated Circuit Cir") name: Integrated Circuit, number: 100, type: Cir >>> p("100 Above Average IQ Ave.") name: Above Average IQ, number: 100, type: Ave >>> p("100 Big and Strong St.") name: Big and Strong, number: 100, type: St To fix this, I modified type_ to enforce that after matching the TYPES, that there should be no further word body characters - defined using ~Word(alphas). type_ = Combine( oneOf(TYPES, caseless=True) + ~Word(alphas) + Optional(".").suppress()) With this change (and reverting name back to its original form), all the new tests pass. I uploaded a new file to http://pyparsing.pastebin.com/m39133f55. (This also includes another bugfix that was separately reported, that numberSuffix was missing "rd", as in "53rd St".) I'll correct the example in the next release, and the online version on the pyparsing wiki. Also, thanks for the doctest example, I'll leave the tests in this form (especially since they are actual *tests* now!). -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Ken Kuhlman Sent: Thursday, June 05, 2008 2:15 PM To: pyp...@li... Subject: [Pyparsing] negative lookahead problem In the example at http://pastebin.com/m8248134, I've taken the streetAddressParser.py example and added a failing test to show that the street name grammar is too naive. I've been trying to fix it using negative lookahead.. is this the right general approach? My attempt causes pyparsing to loop endlessly -- any hints? I'm using version 1.5.0. thanks! -Ken ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Ken K. <ksk...@gm...> - 2008-06-05 19:15:04
|
In the example at http://pastebin.com/m8248134, I've taken the streetAddressParser.py example and added a failing test to show that the street name grammar is too naive. I've been trying to fix it using negative lookahead.. is this the right general approach? My attempt causes pyparsing to loop endlessly -- any hints? I'm using version 1.5.0. thanks! -Ken |
From: Michael D. <md...@st...> - 2008-06-04 18:04:14
|
Looks fine to me. Certainly addresses my original issue, and then some. Cheers, Mike Paul McGuire wrote: > Mike - > > Thanks for this submission, I see no reason why I wouldn't just drop this > into the main pyparsing code - it seems to conditionalize around the > presence/absence of xml.sax.saxutils very nicely. > > My question is more about just how minimal/lame xml.sax.saxutils.escape > actually seems to be. In the list of common HTML entities defined later in > pyparsing.py, I also include a mapping for '"' to """, but xml...escape > does not handle that case. There is also handling of an optional dict, > which if provided calls __dict_replace, which is not implemented. I think I > am less interested in a verbatim copy of xml...escape than I am in having > one that does a decent job of escaping - I think maybe I am more picky about > this code since it would actually become part of the pyparsing source. > > So I think I will just discard importing and using xml.sax.saxutils.escape > altogether, and replace it with xml_escape, which will be implemented as: > > def xml_escape(data): > """Escape &, <, >, ", etc. in a string of data.""" > > # ampersand must be replaced first > from_symbols = '&><"' > to_symbols = ['&'+s+';' for s in "amp gt lt quot".split()] > for from_,to_ in zip(from_symbols, to_symbols): > data = data.replace(from_, to_) > return data > > This handles the 4 special entities defined in HTML 2.0 > (http://www.w3.org/MarkUp/html-spec/html-spec_9.html#SEC9.7). > > -- Paul > > (On further review, I see that I was erroneously mapping ' to " instead > of " - I'll have that fix along with xml_escape posted to SVN shortly.) > > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA |
From: Paul M. <pt...@au...> - 2008-06-04 17:41:59
|
Mike - Thanks for this submission, I see no reason why I wouldn't just drop this into the main pyparsing code - it seems to conditionalize around the presence/absence of xml.sax.saxutils very nicely. My question is more about just how minimal/lame xml.sax.saxutils.escape actually seems to be. In the list of common HTML entities defined later in pyparsing.py, I also include a mapping for '"' to """, but xml...escape does not handle that case. There is also handling of an optional dict, which if provided calls __dict_replace, which is not implemented. I think I am less interested in a verbatim copy of xml...escape than I am in having one that does a decent job of escaping - I think maybe I am more picky about this code since it would actually become part of the pyparsing source. So I think I will just discard importing and using xml.sax.saxutils.escape altogether, and replace it with xml_escape, which will be implemented as: def xml_escape(data): """Escape &, <, >, ", etc. in a string of data.""" # ampersand must be replaced first from_symbols = '&><"' to_symbols = ['&'+s+';' for s in "amp gt lt quot".split()] for from_,to_ in zip(from_symbols, to_symbols): data = data.replace(from_, to_) return data This handles the 4 special entities defined in HTML 2.0 (http://www.w3.org/MarkUp/html-spec/html-spec_9.html#SEC9.7). -- Paul (On further review, I see that I was erroneously mapping ' to " instead of " - I'll have that fix along with xml_escape posted to SVN shortly.) |