pyparsing-users Mailing List for Python parsing module (Page 11)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Paul M. <pt...@au...> - 2011-07-02 22:56:53
|
Instead of: field = '$' + Combine(Word(alphanums)).setResultsName('field_name') as_field = '$' + Combine(Word(alphanums)).setResultsName('new_field_name') how about: anumExpr = Combine(Word(alphanums)) field = '$' + anumExpr.setResultsName('field_name') as_field = '$' + anumExpr.setResultsName('new_field_name') setResultsName is an unfortunate method name and I have regretted it almost since version 0.9. setResultsName is not really a mutator or 'setter' on an expression - it is really a copy method that makes a copy of the expression and then sets the return name behavior onthe copy. You should read 'setResultsName' as 'copyAndReturnResultsWithTheName'. The reason for this is that I wanted a single expression to be able to be used multiple times but with different semantic values in the overall parser. Let's say I was parsing a record with an criminal identifier, and various numbers of different types of crimes: ident = Word(alphanums) integer = Word(nums) criminalRec = ident("id") + integer("mis_A") + integer("mis_B") + integer("felony_A") + integer("felony_B") + integer("felony_C") (Note that I'm using the new form of setResultsName, which makes this much easier to follow than: criminalRec = ident.setResultsName("id") + integer.setResultsName("mis_A") + integer.setResultsName("mis_B") + integer.setResultsName("felony_A") + integer.setResultsName("felony_B") + integer.setResultsName("felony_C") If setResultsName was just a setter, than these multiple references to integer would clash with each other. So setResultsName has to act on a copy of the expression, not on the expression itself. So I would suggest this variation on your original parser: anumExpr = Combine(Word(alphanums)) field = '$' + anumExpr('field_name') as_field = '$' + anumExpr('new_field_name') (For that matter, Combine is only useful when there are several expressions to combine into a single token - it is unnecessary here. Just do: anumExpr = Word(alphanums) -- Paul -----Original Message----- From: cathal coffey [mailto:cof...@gm...] Sent: Saturday, July 02, 2011 3:18 PM To: pyp...@li... Subject: [Pyparsing] setResultsName() not working as expected Guys, I have the following but I don't like it because I am basically repeating code just to change the results name depending on the situation. field = '$' + Combine(Word(alphanums)).setResultsName('field_name') as_field = '$' + Combine(Word(alphanums)).setResultsName('new_field_name') I would much prefer to do something like this. field = '$'.suppress() + Combine(Word(alphanums)) as_field = field.copy() field.setResultsName('field_name') as_field.setResultsName('new_field_name') The second however doesn't parse as expected. Why aren't the two of these equivalent? Kind regards, Cathal |
From: Paul M. <pt...@au...> - 2011-07-02 22:46:06
|
No tokens is not a dict, it is a ParseResults, but I want it to look very much like a dict. Still, has_key() is a deprecated call on dict, and the preferred form is: if 'new_field_name' in tokens: ParseResults will support this method. In general, has_key() should be discarded now, in favor of "in". -- Paul -----Original Message----- From: cathal coffey [mailto:cof...@gm...] Sent: Saturday, July 02, 2011 5:19 PM To: pyp...@li... Subject: [Pyparsing] tokens.has_key('abc') throws exception Hey Guys, I am using setResultsName like this setResultsName('new_field_name') then inside my parse action I want to do something like the below. if tokens.has_key('new_field_name'): #Do something else: #Do something else However this causes an exception to be thrown. I guess tokens is not a dict even tho it appears to function like one. My not so elegant solution to avoid this problem is the below. try: new_field_name = tokens['new_field_name'] #Do something except: #Do something else I don't like this try -> except approach, does anyone have a more elegant solution? Kind regards, Cathal |
From: cathal c. <cof...@gm...> - 2011-07-02 22:23:59
|
I made a typo in my first email the line field = '.suppress() + Combine(Word(alphanums)) should read field = Literal(').suppress() + Combine(Word(alphanums)) However this still doesn't solve my problem. Can you please explain why the two code snippets shown in the first email aren't equivalent? |
From: cathal c. <cof...@gm...> - 2011-07-02 22:19:27
|
Hey Guys, I am using setResultsName like this setResultsName('new_field_name') then inside my parse action I want to do something like the below. if tokens.has_key('new_field_name'): #Do something else: #Do something else However this causes an exception to be thrown. I guess tokens is not a dict even tho it appears to function like one. My not so elegant solution to avoid this problem is the below. try: new_field_name = tokens['new_field_name'] #Do something except: #Do something else I don't like this try -> except approach, does anyone have a more elegant solution? Kind regards, Cathal |
From: cathal c. <cof...@gm...> - 2011-07-02 20:18:14
|
Guys, I have the following but I don't like it because I am basically repeating code just to change the results name depending on the situation. field = '$' + Combine(Word(alphanums)).setResultsName('field_name') as_field = '$' + Combine(Word(alphanums)).setResultsName('new_field_name') I would much prefer to do something like this. field = '$'.suppress() + Combine(Word(alphanums)) as_field = field.copy() field.setResultsName('field_name') as_field.setResultsName('new_field_name') The second however doesn't parse as expected. Why aren't the two of these equivalent? Kind regards, Cathal |
From: cathal c. <cof...@gm...> - 2011-06-23 08:39:25
|
Ralph, thank you very much for the excellent explanation. Its working now. Kind regards, Cathal |
From: Ralph C. <ra...@in...> - 2011-06-22 22:36:05
|
Hi Cathal, > I have found something very weird. When I type the below into my > python console I get the following error: ParseException: Expected ")" > (at char 3), (line:1, col:4) > > from pyparsing import * > function = Forward() > function = '#' + Word(nums) + Group(Literal('(') + Optional(function)) + Literal(')') > function.parseString('#0(#1())', True) I get that too, and it seems correct. > However, if I then re-enter everything accept the line: function = > Forward() it works as I initially expected it to. > > from pyparsing import * > function = '#' + Word(nums) + Group(Literal('(') + Optional(function)) + Literal(')') > function.parseString('#0(#1())', True) > Out[6]: (['#', '0', (['(', '#', '1', (['('], {}), ')'], {}), ')'], {}) I get >>> from pyparsing import * >>> function = '#' + Word(nums) + Group(Literal('(') + Optional(function)) + Literal(')') Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'function' is not defined >>> which again seems correct. It seems you're entering these lines into the same Python interpreter in which you've already defined function. IOW, your two tests are not isolated; the second is altered by the leftovers of the first. Again, printing function in both cases may have given a clue. First time, >>> print function {"#" W:(0123...) Group:({"(" [Forward: None]}) ")"} second time, >>> print function {"#" W:(0123...) Group:({"(" [{"#" W:(0123...) Group:({"(" [Forward: None]}) ")"}]}) ")"} So the second one still isn't right: >>> function.parseString('#0(#1(#2()))', True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/pymodules/python2.6/pyparsing.py", line 1076, in parseString raise exc pyparsing.ParseException: Expected ")" (at char 3), (line:1, col:4) >>> You haven't got a recursive grammar, just one two levels deep. > How do I make this work without having to enter the first sequence and > then the second sequence of modified statements? from pyparsing import * f = Forward() function = '#' + Word(nums) + Group(Literal('(') + Optional(f)) + Literal(')') f << function function.parseString('#0(#1())', True) You must keep a reference to the Forward() so that you can "fill it in" later. You can't do this is you through it away by immediately re-assigning to function, hence my use of f. Note, the Forward is no longer None. >>> print function {"#" W:(0123...) Group:({"(" [Forward: {"#" W:(0123...) Group:({"(" [...]}) ")"}]}) ")"} Cheers, Ralph. |
From: cathal c. <cof...@gm...> - 2011-06-22 21:14:12
|
Hi everyone, I have found something very weird. When I type the below into my python console I get the following error: ParseException: Expected ")" (at char 3), (line:1, col:4) from pyparsing import * function = Forward() function = '#' + Word(nums) + Group(Literal('(') + Optional(function)) + Literal(')') function.parseString('#0(#1())', True) However, if I then re-enter everything accept the line: function = Forward() it works as I initially expected it to. from pyparsing import * function = '#' + Word(nums) + Group(Literal('(') + Optional(function)) + Literal(')') function.parseString('#0(#1())', True) Out[6]: (['#', '0', (['(', '#', '1', (['('], {}), ')'], {}), ')'], {}) How do I make this work without having to enter the first sequence and then the second sequence of modified statements? Kind regards, Cathal |
From: Ralph C. <ra...@in...> - 2011-06-22 10:49:44
|
Hi, Paul McGuire wrote: > > from pyparsing import Word, alphas > > g = "$" + "as" + Word(alphas) > > g.parseString('$ as abc') > > ParseException: Expected "$as" (at char 0), (line:1, col:1) > > Here's your definition of g > >>> g = "$" + "as" + Word(alphas) > >>> print g > {"$as" W:(abcd...)} > > "$" + "as" gets interpreted by Python as the addition of two strings, > resulting in "$as". I guess yet another way, keyword issue aside, is to alter the associativity with parenthesis? >>> g = "$" + ("as" + Word(alphas)) >>> print g {"$" {"as" W:(abcd...)}} >>> g.parseString('$ as abc') (['$', 'as', 'abc'], {}) >>> Cheers, Ralph. |
From: Paul M. <pt...@au...> - 2011-06-22 00:56:11
|
Eike - Good point! It isn't necessary to make '$' the Literal expression, it is just as good (and if using Keyword, probably more appropriate) to make 'as' the expression using the Keyword class. -- Paul -----Original Message----- From: Eike Welk [mailto:eik...@gm...] Sent: Tuesday, June 21, 2011 7:24 PM To: pyp...@li... Subject: Re: [Pyparsing] White space Hello Cathal! I'm trying to answer your question without testing anything. So it might be completely wrong. On Wednesday 22.06.2011 00:34:49 cathal coffey wrote: > Hello, > > I have a quick question. Shouldn't the below grammar accept the given > string? > > from pyparsing import Word, alphas > g = "$" + "as" + Word(alphas) > g.parseString('$ as abc') > ParseException: Expected "$as" (at char 0), (line:1, col:1) It the order with which the ``+`` operators are applied in the first statement, that fools you. The left ``+`` operator is applied first so that two strings are added:: "$" + "as" Therefor your first statement is equivalent to:: g = "$as" + Word(alphas) You get the desired result with:: g = Literal("$") + "as" + Word(alphas) Look also at ``Keyword``. You probably want:: from pyparsing import Word, alphas, Keyword Kw = Keyword g = "$" + Kw("as") + Word(alphas) The string ``"$"`` is now implicitly converted to ``Literal("$")`` because to apply the left ``+`` operator, Python calls the method ``Keyword.__radd__``. This method converts strings to literals. For the behavior of Python's operators look at the official (but somewhat cryptic) explanation: http://docs.python.org/reference/datamodel.html#emulating-numeric-types > > I want the grammar to accept $ followed by: any number of spaces > followed by: as followed by: any number of spaces followed by: any > string of characters. HTH, Eike. ---------------------------------------------------------------------------- -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Eike W. <eik...@gm...> - 2011-06-22 00:24:22
|
Hello Cathal! I'm trying to answer your question without testing anything. So it might be completely wrong. On Wednesday 22.06.2011 00:34:49 cathal coffey wrote: > Hello, > > I have a quick question. Shouldn't the below grammar accept the given > string? > > from pyparsing import Word, alphas > g = "$" + "as" + Word(alphas) > g.parseString('$ as abc') > ParseException: Expected "$as" (at char 0), (line:1, col:1) It the order with which the ``+`` operators are applied in the first statement, that fools you. The left ``+`` operator is applied first so that two strings are added:: "$" + "as" Therefor your first statement is equivalent to:: g = "$as" + Word(alphas) You get the desired result with:: g = Literal("$") + "as" + Word(alphas) Look also at ``Keyword``. You probably want:: from pyparsing import Word, alphas, Keyword Kw = Keyword g = "$" + Kw("as") + Word(alphas) The string ``"$"`` is now implicitly converted to ``Literal("$")`` because to apply the left ``+`` operator, Python calls the method ``Keyword.__radd__``. This method converts strings to literals. For the behavior of Python's operators look at the official (but somewhat cryptic) explanation: http://docs.python.org/reference/datamodel.html#emulating-numeric-types > > I want the grammar to accept $ followed by: any number of spaces > followed by: as followed by: any number of spaces followed by: any > string of characters. HTH, Eike. |
From: Paul M. <pt...@au...> - 2011-06-22 00:16:50
|
Here's your definition of g >>> g = "$" + "as" + Word(alphas) >>> print g {"$as" W:(abcd...)} "$" + "as" gets interpreted by Python as the addition of two strings, resulting in "$as". Force the leading "$" to be a pyparsing Literal, and then you'll get a better parser: >>> g = Literal("$") + "as" + Word(alphas) >>> print g {{"$" "as"} W:(abcd...)} -- Paul -----Original Message----- From: cathal coffey [mailto:cof...@gm...] Sent: Tuesday, June 21, 2011 5:35 PM To: pyp...@li... Subject: [Pyparsing] White space Hello, I have a quick question. Shouldn't the below grammar accept the given string? from pyparsing import Word, alphas g = "$" + "as" + Word(alphas) g.parseString('$ as abc') ParseException: Expected "$as" (at char 0), (line:1, col:1) I want the grammar to accept $ followed by: any number of spaces followed by: as followed by: any number of spaces followed by: any string of characters. The reason I am confused is that the following grammar accepts regardless of white space... why doesn't the above? g = "as" + Word(alphas) g.parseString(' as abc ') Out: (['as', 'abc'], {}) Kind regards, Cathal ---------------------------------------------------------------------------- -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: cathal c. <cof...@gm...> - 2011-06-21 22:35:16
|
Hello, I have a quick question. Shouldn't the below grammar accept the given string? from pyparsing import Word, alphas g = "$" + "as" + Word(alphas) g.parseString('$ as abc') ParseException: Expected "$as" (at char 0), (line:1, col:1) I want the grammar to accept $ followed by: any number of spaces followed by: as followed by: any number of spaces followed by: any string of characters. The reason I am confused is that the following grammar accepts regardless of white space... why doesn't the above? g = "as" + Word(alphas) g.parseString(' as abc ') Out: (['as', 'abc'], {}) Kind regards, Cathal |
From: Paul M. <pt...@au...> - 2011-05-31 03:46:39
|
Luke - I tripped over this same bug doing HTML tag stripping. The latest pyparsing in SVN has a version of _flatten that uses a lot less recursion. Also, a slightly improved version of transformString, which is smart enough to leave out empty strings from the list of strings to be joined. The new _flatten only recurses if there is a nesting of parsed structures: def _flatten(L): ret = [] for i in L: if isinstance(i,list): ret.extend(_flatten(i)) else: ret.append(i) return ret -- Paul -----Original Message----- From: Luke Campagnola [mailto:lca...@em...] Sent: Monday, May 30, 2011 10:28 AM To: pyp...@li... Subject: [Pyparsing] recursion in _flatten after version 1.5.2 Howdy, I have a script that I use to strip comments from C files. The code fails for any pyparsing version later than 1.5.2. Backtrace looks like this: File "CParser.py", line 300, in removeComments self.files[file] = (quotedString | cStyleComment.suppress() | cplusplusLineComment.suppress()).transformString(text) File "c:\python27\lib\site-packages\pyparsing.py", line 1166, in transformString return "".join(map(_ustr,_flatten(out))) File "c:\python27\lib\site-packages\pyparsing.py", line 3190, in _flatten return _flatten(L[0]) + _flatten(L[1:]) File "c:\python27\lib\site-packages\pyparsing.py", line 3190, in _flatten return _flatten(L[0]) + _flatten(L[1:]) . . . File "c:\python27\lib\site-packages\pyparsing.py", line 3188, in _flatten if type(L) is not list: return [L] RuntimeError: maximum recursion depth exceeded while calling a Python object If I remember correctly, 'quotedString' and cStyleComment' are provided by pyparsing, and I have provided this definition: cplusplusLineComment = Literal("//") + restOfLine Any hints? Luke ---------------------------------------------------------------------------- -- vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Luke C. <lca...@em...> - 2011-05-30 15:28:17
|
Howdy, I have a script that I use to strip comments from C files. The code fails for any pyparsing version later than 1.5.2. Backtrace looks like this: File "CParser.py", line 300, in removeComments self.files[file] = (quotedString | cStyleComment.suppress() | cplusplusLineComment.suppress()).transformString(text) File "c:\python27\lib\site-packages\pyparsing.py", line 1166, in transformString return "".join(map(_ustr,_flatten(out))) File "c:\python27\lib\site-packages\pyparsing.py", line 3190, in _flatten return _flatten(L[0]) + _flatten(L[1:]) File "c:\python27\lib\site-packages\pyparsing.py", line 3190, in _flatten return _flatten(L[0]) + _flatten(L[1:]) . . . File "c:\python27\lib\site-packages\pyparsing.py", line 3188, in _flatten if type(L) is not list: return [L] RuntimeError: maximum recursion depth exceeded while calling a Python object If I remember correctly, 'quotedString' and cStyleComment' are provided by pyparsing, and I have provided this definition: cplusplusLineComment = Literal("//") + restOfLine Any hints? Luke |
From: Adam L. <ad...@su...> - 2011-05-10 16:45:39
|
Having a hard time getting to grips with pyparsing, especially as related to syslog parsing. Using the following line as a sample, how would you parse the delimited list of key=value pairs? I've got the date/time, hostname, syslog tag parts without problem but I'm unsure how to use the delimitedList operator. I'm using the following: equals = Literal('=').setName('equals') key_value = Word(printables) + equals + Word(printables) delimitedList(OneOrMore(key_value)) + Optional(Word(printables)) 2011-05-08T04:07:18-04:00 host01 postfix/qmgr[12757]: 5A7D92478EE: from=<7ob...@gs...>, size=13908, nrcpt=1 (queue active) I get the following error: Expected equals (at char 104), (line:1, col:105) |
From: Duncan M. <dun...@gm...> - 2011-04-26 04:33:25
|
Hey all, I've started a new project to explore interactive fiction in Python... using PyPasring :-) It's nothing really new yet -- based on Paul's PyCon 2006 talk. You can check it out here: https://launchpad.net/myriad-worlds Right now, it's using YAML to provide storyline data (such that it is). I'm very curious to see if YAML is up to the task of defining object- or connection-oriented narratives (to be honest, I don't see why it shouldn't). Paul, since revision 1 (currently at revision 13) of this was a straight-up copy of your adventure engine, it'd like to respect whatever vision of licensing you had for it. I've set it as BSD for now; let me know your preference, and I'll update it. There's also a nascent wiki page up for IF resources in Ubuntu (with several Python IF projects/links present as well): https://wiki.ubuntu.com/InteractiveFiction Feel free to update it, if IF and Ubuntu are your cups of tea. Thanks! d |
From: Paul M. <pt...@au...> - 2011-03-22 02:55:45
|
(Here is the link: http://oreilly.com/store/dd-jpn.html) Attention Pyparsing Community! Have you been using pyparsing, and thinking about whether or not to get the O'Reilly e-book, "Getting Started With Pyparsing"? Now is the time! This Tuesday, March 22, O'Reilly will be offering *all* of their O'Reilly, No Starch Press, and Tidbits e-books and videos at 50% off, AND donating the proceeds to Japanese Red Cross for earthquake and tsunami disaster relief. With such a generous effort on their part, I will also be donating my author's share as well. So you can get your official copy of GSWP for 1/2 the regular price AND all the money goes to the Japanese Red Cross to help those affected by the recent earthquake and tsunami. What's not to like?! The offer runs for just 1 day, beginning at 12:01am PT Tuesday, March 22. I hope you get a chance to take O'Reilly up on their offer, not just for GSWP but all of their e-books and videos! Regards, -- Paul |
From: Paul M. <pt...@au...> - 2011-03-22 02:45:22
|
Attention Pyparsing Community! Have you been using pyparsing, and thinking about whether or not to get the O'Reilly e-book, "Getting Started With Pyparsing"? Now is the time! This Tuesday, March 22, O'Reilly will be offering *all* of their O'Reilly, No Starch Press, and Tidbits e-books and videos at 50% off, AND donating the proceeds to Japanese Red Cross for earthquake and tsunami disaster relief. With such a generous effort on their part, I will also be donating my author's share as well. So you can get your official copy of GSWP for 1/2 the regular price AND all the money goes to the Japanese Red Cross to help those affected by the recent earthquake and tsunami. What's not to like?! The offer runs for just 1 day, beginning at 12:01am PT Tuesday, March 22. I hope you get a chance to take O'Reilly up on their offer, not just for GSWP but all of their e-books and videos! Regards, -- Paul |
From: Michael D. <md...@gm...> - 2011-03-16 16:05:09
|
Sorry about that. I've put the patch up here: https://gist.github.com/869225 Mike On Mon, Mar 14, 2011 at 12:50 AM, Paul McGuire <pt...@au...> wrote: > This sounds like some terrific work, thanks! Unfortunately I got no > attachment on your e-mail, could you paste it to someplace publicly > accessible, maybe pastebin.com? I've got some other changes queued up for > a next release, but this would be great to get included. > > Please write back when you've got your code posted. > > Thanks! > -- Paul > > > > > -----Original Message----- > From: Michael Droettboom [mailto:md...@gm...] > Sent: Friday, March 11, 2011 1:11 PM > To: pyp...@li... > Subject: [Pyparsing] Patch to fix memory leaks with Python 3.x > > We are in the process of porting matplotlib to Python 3.x. Matplotlib uses > pyparsing to parse a TeX-like mini-language for math expressions. > > A bunch of hard-working folks at the Cape Town PUG noticed that memory was > leaking like crazy whenever this functionality was being used. On further > investigation, it very confusingly turns out it was leaking stack frames, so > even objects that never touched the pyparsing-based parser were getting > leaked. > > This seems to be centered around the change in Python 3.x where exception > objects contain a member "__traceback__" containing the full traceback of > the exception. This means that an exception object that is referenced > outside of an except block will create a cyclical reference with the local > stack frame in which its in. For example, in code like: > > try: > do_something() > except Exception as exc: > my_exc = exc # my_exc will live beyond the except block return my_exc > > This creates a cylical reference from my_exc -> my_exc.__traceback__ -> > local stack frame -> my_exc. > > Having cyclical references means that any local variable *anywhere in the > stack* of the thrown exception, will not be freed until the garbage > collector feels enough pressure to do so. When those objects include > C-extensions that allocate memory on the heap (as is the case in > matplotlib), the garbage collector doesn't know enough about those objects > to start freeing soon enough, and memory usage quickly grows unmanageable. > > See this warning in the "porting to Python 3" guide: > > http://docs.python.org/py3k/howto/pyporting.html#capturing-the-currently-raised-exception > > See also the "Open Issue" section of PEP 3134: > http://www.python.org/dev/peps/pep-3134/ > > This causes a lot of headaches storing and passing around exceptions for > later use as pyparsing does routinely. > > I have attached a patch against SVN that seems to resolve these reference > leaks -- at least the ones that are exercised by matplotlib's math parser. > > The changes fall into a number of categories: > > 1) Remove use of sys.exc_info(). In Python 3, the exception object (that > is the "exc" variable of "except Exception as exc") is automatically > dereferenced upon leaving the except block. The same is not true of the > result of sys.exc_info(), and if the exception object leaves the except > block it requires special care to avoid creating a cyclical reference with > the frame. It's not required, but it does simplify the code a lot to simply > use "except Exception as exc" where it applies. > > 2) By storing the "myException" object in ParserElement objects, it was > creating a cyclical reference between the exception object and the > ParserElement object. This was not much of a problem in Python 2.x, but in > Python 3.x since exception objects pull in all the baggage from the > traceback, the memory wastage is considerable. I fixed this case by simply > creating exception objects when they are raised, and not maintaining a > myException member. I don't know why the myException member existed in the > first place (performance considerations perhaps?), so I don't know if there > are downsides to this change. An alternative might be to store a weak > reference to the ParserElement inside of the exception object -- but that > creates a user-visible API change to the exception object. > > 3) When exception objects do need to exist outside of the except block, the > traceback should be removed from the exception object, using > "exc.__traceback__ = None". There are a few examples of this, such as > storing exceptions in the parser cache (in _parseCache). By deleting the > traceback, it is basically restored to the behavior of the old Python 2.x > code, which, by using sys.exc_info(), was storing the exception only and not > the traceback payload. > > Thanks again for pyparsing -- it has been invaluable on our project. I > hope this patch will benefit others making the transition to Python 3. > > Cheers, > Mike > > -- > Michael Droettboom > http://www.droettboom.com/ > > -- Michael Droettboom http://www.droettboom.com/ |
From: Paul M. <pt...@au...> - 2011-03-14 04:50:43
|
This sounds like some terrific work, thanks! Unfortunately I got no attachment on your e-mail, could you paste it to someplace publicly accessible, maybe pastebin.com? I've got some other changes queued up for a next release, but this would be great to get included. Please write back when you've got your code posted. Thanks! -- Paul -----Original Message----- From: Michael Droettboom [mailto:md...@gm...] Sent: Friday, March 11, 2011 1:11 PM To: pyp...@li... Subject: [Pyparsing] Patch to fix memory leaks with Python 3.x We are in the process of porting matplotlib to Python 3.x. Matplotlib uses pyparsing to parse a TeX-like mini-language for math expressions. A bunch of hard-working folks at the Cape Town PUG noticed that memory was leaking like crazy whenever this functionality was being used. On further investigation, it very confusingly turns out it was leaking stack frames, so even objects that never touched the pyparsing-based parser were getting leaked. This seems to be centered around the change in Python 3.x where exception objects contain a member "__traceback__" containing the full traceback of the exception. This means that an exception object that is referenced outside of an except block will create a cyclical reference with the local stack frame in which its in. For example, in code like: try: do_something() except Exception as exc: my_exc = exc # my_exc will live beyond the except block return my_exc This creates a cylical reference from my_exc -> my_exc.__traceback__ -> local stack frame -> my_exc. Having cyclical references means that any local variable *anywhere in the stack* of the thrown exception, will not be freed until the garbage collector feels enough pressure to do so. When those objects include C-extensions that allocate memory on the heap (as is the case in matplotlib), the garbage collector doesn't know enough about those objects to start freeing soon enough, and memory usage quickly grows unmanageable. See this warning in the "porting to Python 3" guide: http://docs.python.org/py3k/howto/pyporting.html#capturing-the-currently-raised-exception See also the "Open Issue" section of PEP 3134: http://www.python.org/dev/peps/pep-3134/ This causes a lot of headaches storing and passing around exceptions for later use as pyparsing does routinely. I have attached a patch against SVN that seems to resolve these reference leaks -- at least the ones that are exercised by matplotlib's math parser. The changes fall into a number of categories: 1) Remove use of sys.exc_info(). In Python 3, the exception object (that is the "exc" variable of "except Exception as exc") is automatically dereferenced upon leaving the except block. The same is not true of the result of sys.exc_info(), and if the exception object leaves the except block it requires special care to avoid creating a cyclical reference with the frame. It's not required, but it does simplify the code a lot to simply use "except Exception as exc" where it applies. 2) By storing the "myException" object in ParserElement objects, it was creating a cyclical reference between the exception object and the ParserElement object. This was not much of a problem in Python 2.x, but in Python 3.x since exception objects pull in all the baggage from the traceback, the memory wastage is considerable. I fixed this case by simply creating exception objects when they are raised, and not maintaining a myException member. I don't know why the myException member existed in the first place (performance considerations perhaps?), so I don't know if there are downsides to this change. An alternative might be to store a weak reference to the ParserElement inside of the exception object -- but that creates a user-visible API change to the exception object. 3) When exception objects do need to exist outside of the except block, the traceback should be removed from the exception object, using "exc.__traceback__ = None". There are a few examples of this, such as storing exceptions in the parser cache (in _parseCache). By deleting the traceback, it is basically restored to the behavior of the old Python 2.x code, which, by using sys.exc_info(), was storing the exception only and not the traceback payload. Thanks again for pyparsing -- it has been invaluable on our project. I hope this patch will benefit others making the transition to Python 3. Cheers, Mike -- Michael Droettboom http://www.droettboom.com/ |
From: Michael D. <md...@gm...> - 2011-03-11 19:20:13
|
We are in the process of porting matplotlib to Python 3.x. Matplotlib uses pyparsing to parse a TeX-like mini-language for math expressions. A bunch of hard-working folks at the Cape Town PUG noticed that memory was leaking like crazy whenever this functionality was being used. On further investigation, it very confusingly turns out it was leaking stack frames, so even objects that never touched the pyparsing-based parser were getting leaked. This seems to be centered around the change in Python 3.x where exception objects contain a member "__traceback__" containing the full traceback of the exception. This means that an exception object that is referenced outside of an except block will create a cyclical reference with the local stack frame in which its in. For example, in code like: try: do_something() except Exception as exc: my_exc = exc # my_exc will live beyond the except block return my_exc This creates a cylical reference from my_exc -> my_exc.__traceback__ -> local stack frame -> my_exc. Having cyclical references means that any local variable *anywhere in the stack* of the thrown exception, will not be freed until the garbage collector feels enough pressure to do so. When those objects include C-extensions that allocate memory on the heap (as is the case in matplotlib), the garbage collector doesn't know enough about those objects to start freeing soon enough, and memory usage quickly grows unmanageable. See this warning in the "porting to Python 3" guide: http://docs.python.org/py3k/howto/pyporting.html#capturing-the-currently-raised-exception See also the "Open Issue" section of PEP 3134: http://www.python.org/dev/peps/pep-3134/ This causes a lot of headaches storing and passing around exceptions for later use as pyparsing does routinely. I have attached a patch against SVN that seems to resolve these reference leaks -- at least the ones that are exercised by matplotlib's math parser. The changes fall into a number of categories: 1) Remove use of sys.exc_info(). In Python 3, the exception object (that is the "exc" variable of "except Exception as exc") is automatically dereferenced upon leaving the except block. The same is not true of the result of sys.exc_info(), and if the exception object leaves the except block it requires special care to avoid creating a cyclical reference with the frame. It's not required, but it does simplify the code a lot to simply use "except Exception as exc" where it applies. 2) By storing the "myException" object in ParserElement objects, it was creating a cyclical reference between the exception object and the ParserElement object. This was not much of a problem in Python 2.x, but in Python 3.x since exception objects pull in all the baggage from the traceback, the memory wastage is considerable. I fixed this case by simply creating exception objects when they are raised, and not maintaining a myException member. I don't know why the myException member existed in the first place (performance considerations perhaps?), so I don't know if there are downsides to this change. An alternative might be to store a weak reference to the ParserElement inside of the exception object -- but that creates a user-visible API change to the exception object. 3) When exception objects do need to exist outside of the except block, the traceback should be removed from the exception object, using "exc.__traceback__ = None". There are a few examples of this, such as storing exceptions in the parser cache (in _parseCache). By deleting the traceback, it is basically restored to the behavior of the old Python 2.x code, which, by using sys.exc_info(), was storing the exception only and not the traceback payload. Thanks again for pyparsing -- it has been invaluable on our project. I hope this patch will benefit others making the transition to Python 3. Cheers, Mike -- Michael Droettboom http://www.droettboom.com/ |
From: Werner F. B. <wer...@fr...> - 2011-01-07 09:07:28
|
Paul and Eike, Thanks for your pointers. On 07/01/2011 02:18, Paul McGuire wrote: > I'm not super-keen on your variable naming (using alphanums as a Word > expression, overloading the alphanums string defined in pyparsing) As I import pyparsing as pyp it didn't cause me any problems, but you are right, it is changed. , but > let's go with it. The alphanums string in pyparsing is purely 7-bit ASCII > characters. As a first pass, try changing this to add the alphas8bit string: > > alphanums = Word(pyp.alphanums + pyp.alphas8bit) > > This should handle your posted question. That did it but will probably go with below. Thanks Werner > > If you need to handle more of the Unicode set (beyond chr(256)), then you'll > need to use these definitions: > >>>> alphas = u''.join(unichr(c) for c in range(65536) if > unichr(c).isalpha()) >>>> len(alphas) > 47672 >>>> nums = u''.join(unichr(c) for c in range(65536) if unichr(c).isdigit()) >>>> len(nums) > 404 > > So if you go to embracing all Unicode strings, there are actually over 400 > characters that are considered to be numeric digits. But I think alphas8bit > should carry you along for a while. > > -- Paul > > > > -----Original Message----- > From: Werner F. Bruhin [mailto:wer...@fr...] > Sent: Thursday, January 06, 2011 5:45 AM > To: pyp...@li... > Subject: [Pyparsing] PayPal IPN message parsing > > I am having some problems decoding these messages. > > The data comes in as an email message with a defined content type as > "Content-Type: text/plain", however it is really Content-Type: > text/plain; charset="windows-1252", so I read it in with > > thisfile = codecs.open(regFile, "r", "windows-1252"). > > The parsing works fine except on things like: > > address_name = Göran Petterson > > Which I parse with: > alphanums = pyp.Word(pyp.alphanums) > > # address > str_add_name = pyp.Literal("address_name =").suppress() +\ > alphanums + pyp.restOfLine > add_name = str_add_name.setParseAction(self.str_add_nameAction) > > But I get in str_add_nameAction: > ([u'G', u'\xf6ran Petterson\r'], {}) > > The raw data at this point is "address_name = G\xf6ran Petterson" > > What am I doing wrong in all this? > > I tried using pyp.printables instead of alphanums but with the same result. > > A tip would be very much appreciated. > > Werner > > P.S. > Happy New Year to you all. > > > ---------------------------------------------------------------------------- > -- > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, > and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > > ------------------------------------------------------------------------------ > Gaining the trust of online customers is vital for the success of any company > that requires sensitive data to be transmitted over the Web. Learn how to > best implement a security strategy that keeps consumers' information secure > and instills the confidence they need to proceed with transactions. > http://p.sf.net/sfu/oracle-sfdevnl |
From: Paul M. <pt...@au...> - 2011-01-07 01:34:32
|
I'm not super-keen on your variable naming (using alphanums as a Word expression, overloading the alphanums string defined in pyparsing), but let's go with it. The alphanums string in pyparsing is purely 7-bit ASCII characters. As a first pass, try changing this to add the alphas8bit string: alphanums = Word(pyp.alphanums + pyp.alphas8bit) This should handle your posted question. If you need to handle more of the Unicode set (beyond chr(256)), then you'll need to use these definitions: >>> alphas = u''.join(unichr(c) for c in range(65536) if unichr(c).isalpha()) >>> len(alphas) 47672 >>> nums = u''.join(unichr(c) for c in range(65536) if unichr(c).isdigit()) >>> len(nums) 404 So if you go to embracing all Unicode strings, there are actually over 400 characters that are considered to be numeric digits. But I think alphas8bit should carry you along for a while. -- Paul -----Original Message----- From: Werner F. Bruhin [mailto:wer...@fr...] Sent: Thursday, January 06, 2011 5:45 AM To: pyp...@li... Subject: [Pyparsing] PayPal IPN message parsing I am having some problems decoding these messages. The data comes in as an email message with a defined content type as "Content-Type: text/plain", however it is really Content-Type: text/plain; charset="windows-1252", so I read it in with thisfile = codecs.open(regFile, "r", "windows-1252"). The parsing works fine except on things like: address_name = Göran Petterson Which I parse with: alphanums = pyp.Word(pyp.alphanums) # address str_add_name = pyp.Literal("address_name =").suppress() +\ alphanums + pyp.restOfLine add_name = str_add_name.setParseAction(self.str_add_nameAction) But I get in str_add_nameAction: ([u'G', u'\xf6ran Petterson\r'], {}) The raw data at this point is "address_name = G\xf6ran Petterson" What am I doing wrong in all this? I tried using pyp.printables instead of alphanums but with the same result. A tip would be very much appreciated. Werner P.S. Happy New Year to you all. ---------------------------------------------------------------------------- -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Eike W. <eik...@gm...> - 2011-01-06 18:41:04
|
Hello Werner! On Thursday 06.01.2011 12:44:53 Werner F. Bruhin wrote: > I am having some problems decoding these messages. > > The data comes in as an email message with a defined content type as > "Content-Type: text/plain", however it is really Content-Type: > text/plain; charset="windows-1252", so I read it in with > > thisfile = codecs.open(regFile, "r", "windows-1252"). I think this is correct. You convert the file from "windows-1252" to Unicode prior to parsing. You must write constants as `u"Göran"`. You should IMHO also encode your program's source code with UTF-8 and have the following as the first line: # -*- coding: utf-8 -*- IMHO IPython has additional Unicode problems. This has confused me when I wrote this E-mail, maybe something similar is happening on your computer too. > > The parsing works fine except on things like: > > address_name = Göran Petterson > > Which I parse with: > alphanums = pyp.Word(pyp.alphanums) > > # address > str_add_name = pyp.Literal("address_name =").suppress() +\ > alphanums + pyp.restOfLine > add_name = str_add_name.setParseAction(self.str_add_nameAction) > > But I get in str_add_nameAction: > ([u'G', u'\xf6ran Petterson\r'], {}) `pyp.alphanums` is a string, and it does not contain the character "ö". See: In [1]: import pyparsing as pyp In [2]: pyp.alphanums Out[2]: 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' I think getting a suitable parser with the least typing would be something like: alphanums = pyp.CharsNotIn(" ,.") or str_add_name = pyp.Literal("address_name =").suppress() +\ pyp.restOfLine And keep in mind that foreigners write their names in funny ways. Older Germans, for example, frequently have forenames with hyphens, like "Karl- Heinz" or "Franz-Josef". > > The raw data at this point is "address_name = G\xf6ran Petterson" The code for "ö" in windows-1252 and in Unicode is F6. I think this is correct. It is repr(u"Göran Petterson") http://en.wikipedia.org/wiki/Windows-1252 http://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF All the best, Eike. |