pyparsing-users Mailing List for Python parsing module (Page 16)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Paul M. <pt...@au...> - 2009-06-11 03:42:50
|
This question comes up every so often. Please see this writeup on the Discussion page of the pyparsing wiki: http://pyparsing.wikispaces.com/message/view/home/1089023 In addition to the solution posted there, you could try one of these options instead of using Combine: Use a parse action to join the parsed tokens with an intervening ' ' character: entry = vType+Word(alphanums)+":"+regs+";" entry.setParseString( lambda tokens: ' '.join(tokens) ) Or use the recent originalTextFor helper method: entry = originalTextFor( vType+Word(alphanums)+":"+regs+";" ) Write back if that doesn't answer your question. (Also, check out the C struct parser on the wiki on the http://pyparsing.wikispaces.com/UnderDevelopment page.) -- Paul > -----Original Message----- > From: Sebastian Schoellhammer [mailto:ssc...@gm...] > Sent: Wednesday, June 10, 2009 10:01 PM > To: pyp...@li... > Subject: [Pyparsing] Combine problem > > Hello there! > > This is my first post to this list - first of all thanks for this awesome > module, I could not live without it these days! > > > My goal is to parse the following structure: > s = """ > struct VS_OUTPUT { > float4 pos : SV_Position; > float2 UV : TEXCOORD0; > float3 vVec : TEXCOORD1; > float3 lVec : TEXCOORD2; > float3 nor : TEXCOORD3; > float4 col : TEXCOORD4; > float3 tan : TEXCOORD5; > float4 lpos : TEXCOORD6; > float depth : TEXCOORD7; > float2 dof : TEXCOORD8; > float2 VelocityUV : TEXCOORD9; > };""" > > entryString = "float3 tan : TEXCOORD5;" > > def readStruct(s): > header = Literal("struct VS_OUTPUT")+"{" > > regs = Combine("TEXCOORD"+Word(nums)) | "SV_Position" > vType = Combine("float"+Optional(Word(nums))) > entry = vType+Word(alphanums)+":"+regs+";" > struct = header + OneOrMore(entry) + "};" > > print entry.parseString(entryString) > > This is working as is - but when i change > > entry = vType+Word(alphanums)+":"+regs+";" to entry = > Combine(vType+Word(alphanums)+":"+regs+";") > > to make this into a single string, I get a parse Exception - it's not a > big > problem but I just don't understand why it doesn't work in this case :) > Somehow I can't see the difference to where I use Combine in the lines > above, where it seems just fine. > > Maybe I'm missing something really simple.. thanks for looking! > > seb > > -- > Sebastian Schoellhammer > > Sr. Technical Artist > Square Enix LTD > www.square-enix.com > -------------------------------------------------------------------------- > ---- > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Sebastian S. <ssc...@gm...> - 2009-06-11 03:01:30
|
Hello there! This is my first post to this list - first of all thanks for this awesome module, I could not live without it these days! My goal is to parse the following structure: s = """ struct VS_OUTPUT { float4 pos : SV_Position; float2 UV : TEXCOORD0; float3 vVec : TEXCOORD1; float3 lVec : TEXCOORD2; float3 nor : TEXCOORD3; float4 col : TEXCOORD4; float3 tan : TEXCOORD5; float4 lpos : TEXCOORD6; float depth : TEXCOORD7; float2 dof : TEXCOORD8; float2 VelocityUV : TEXCOORD9; };""" entryString = "float3 tan : TEXCOORD5;" def readStruct(s): header = Literal("struct VS_OUTPUT")+"{" regs = Combine("TEXCOORD"+Word(nums)) | "SV_Position" vType = Combine("float"+Optional(Word(nums))) entry = vType+Word(alphanums)+":"+regs+";" struct = header + OneOrMore(entry) + "};" print entry.parseString(entryString) This is working as is - but when i change entry = vType+Word(alphanums)+":"+regs+";" to entry = Combine(vType+Word(alphanums)+":"+regs+";") to make this into a single string, I get a parse Exception - it's not a big problem but I just don't understand why it doesn't work in this case :) Somehow I can't see the difference to where I use Combine in the lines above, where it seems just fine. Maybe I'm missing something really simple.. thanks for looking! seb -- Sebastian Schoellhammer Sr. Technical Artist Square Enix LTD www.square-enix.com |
From: Paul M. <pt...@au...> - 2009-06-10 19:17:54
|
Asim - I would work on this like they dig a tunnel - from both directions. First I would write out what my DSL syntax would be. Then I would work backwards and write out how this would be implemented in Python. The purpose of the pyparsing DSL converter is to find your DSL syntax, and then create the related Python code. Let's say your application was something about RGB colors, and A and B are going to be colors which, when added together, result in a third color whose RGB values are the respective sums of the R, G, and B values of A and B. Let's say this is your color syntax: A = < 100 10 50 > B = < 0 0 0 > C = A+B The purpose of the DSL converter is to convert the "<>" syntax into a Python-compatible API call (probably a constructor call to a Color class). The "C=A+B" line probably doesn't need DSL handling, your Python Color class can define the __add__() function to do the proper R, G, and B summing. So all your DSL converter in this case would need to do would be to change: A = < 100 10 50 > To: A = Color(100, 10, 50) In pyparsing this is a very simple thing: LT,GT = map(Suppress,"<>") Integer = Word(nums) colorDef = LT + integer*3 + RT colorDef.setParseAction(lambda tokens : "Color(%s, %s, %s)" % tokens) Then use colorDef.transformString() to convert the source read from the imported file into valid Python, and then use that to compile into the executable module. Remember, you are generating a Python script from your DSL script, so you don't have to worry about "carrying values forward from one line to the next" - when your generated code gets compiled to Python bytecode, it will do this for you. Don't try to re-invent Python with your DSL. You are *much* better off using your DSL to *augment* the Python syntax. Thanks for reading the article! -- Paul > -----Original Message----- > From: Asim Malik [mailto:as...@ho...] > Sent: Wednesday, June 10, 2009 1:58 PM > To: pyp...@li... > Subject: [Pyparsing] DSL using pyparsing > > > Ok i am looking at building a DSL using pyparsing and am relatively new to > python . Intially I am looking at something like this: > > A = {build some list, this calls some underlying python api and creates > say a class A} > B = {build another list, this creates another python class B } > C= A+B > > Should i parse this as a single script or parse each statement? If i parse > each statement how can i pass variables that were defined in the previous > statement? As the complexity of the doamin increase, it might be the case > that the construction of A depends on other statements. What i am looking > for is the best approach. > > Thanks > > > > -------------------------------------------------------------------------- > ---- > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Asim M. <as...@ho...> - 2009-06-10 18:57:48
|
Ok i am looking at building a DSL using pyparsing and am relatively new to python . Intially I am looking at something like this: A = {build some list, this calls some underlying python api and creates say a class A} B = {build another list, this creates another python class B } C= A+B Should i parse this as a single script or parse each statement? If i parse each statement how can i pass variables that were defined in the previous statement? As the complexity of the doamin increase, it might be the case that the construction of A depends on other statements. What i am looking for is the best approach. Thanks |
From: Donn I. <don...@gm...> - 2009-06-07 16:19:41
|
Paul McGuire wrote: > Keep plugging! I am not sure what's more magical: pyparsing or your regular ability to nail a problem first time. Your code is dead-on, but I will have to spend some time to savvy it. Thanks once again. Regards, \d BTW - my little project that uses pyparsing (with your solutions, of course) has finally been released and is on https://savannah.nongnu.org/projects/things/ You helped me a few times in 2007/2008. Thanks for the code and the help. |
From: Paul M. <pt...@au...> - 2009-06-07 15:35:25
|
Donn - The thing you are lacking is the use of pyparsing's lookahead class, FollowedBy. In this case, you want to detect the difference between a single (which I have redefined as just "simple | quoted", and let the query expression take care of the repetition) and a field identifier. In this case, you want a negative lookahead of ~FollowedBy(":"). single = simple | quoted COLON = PP.Suppress(':') fieldValue = single + ~PP.FollowedBy(COLON) field = single + COLON + PP.Group(PP.OneOrMore(fieldValue)) phrase = fieldValue phrases = PP.ZeroOrMore(phrase) query = PP.Optional(phrases) + PP.ZeroOrMore(field) To add the dict-style access to the fields, add some results names, and use the Dict class to auto-define results names for each field name. query = PP.Optional(phrases)("phrases") + PP.Dict(PP.ZeroOrMore(PP.Group(field)))("fields") Keep plugging! -- Paul > -----Original Message----- > From: Donn Ingle [mailto:don...@gm...] > Sent: Sunday, June 07, 2009 3:46 AM > To: pyp...@li... > Subject: [Pyparsing] Choppping up search terms > > Hello pyparsers. > As much as pyparsing astounds my small mind, I have yet to write my own > 'grammar' without having to ask for help! And here I am again. I hope > someone will have mercy on me. > > I am adding a search function to Fonty Python and this starts with > chopping- > up a string into tokens. I have a simple set of rules: > > phrase1 phrase2 field: value1 value2 "value three" field2: value1 etc > > Any word on it's own if it's NOT after a fieldname is a phrase alone. > Anything after a fieldname is a sub-phrase of that field, unless it's > another fieldname. > > I am just not getting anything like what I am looking for. > \d > > My hacking and surfing and re-hacking have ended here: > > import pyparsing as PP #testing with vers 1.4.8 and 1.5.0 > > simple = PP.Word(PP.alphas) > quoted = PP.dblQuotedString.setParseAction(PP.removeQuotes) > single = PP.OneOrMore(simple | quoted) > special = PP.Combine(PP.Word(PP.alphas) + ":") + single > > #term = special | single > query = PP.OneOrMore( special | single )# + PP.StringEnd() <- buggy > > tests=[ > u"WTF Huh aField: aValue", > # want: [u'WTF', u'Huh', {u'aField:':[u'aValue']}] > u'aField : "aValue blah" bloop AnotherField: two', > # want: [{u'aField:':[u'aValue blah',u'bloop']},{u'AnotherField:':u'two'}] > u"aField: Someval AnotherField: two" > # want : [{u'aField:':[u'SomeVal']},{u'AnotherField:':u'two'}] > ] > > for test in tests: > print test > try: > tokens=query.parseString(test) > print tokens > except: > print "BUG" > > > > -------------------------------------------------------------------------- > ---- > OpenSolaris 2009.06 is a cutting edge operating system for enterprises > looking to deploy the next generation of Solaris that includes the latest > innovations from Sun and the OpenSource community. Download a copy and > enjoy capabilities such as Networking, Storage and Virtualization. > Go to: http://p.sf.net/sfu/opensolaris-get > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Donn I. <don...@gm...> - 2009-06-07 08:46:09
|
Hello pyparsers. As much as pyparsing astounds my small mind, I have yet to write my own 'grammar' without having to ask for help! And here I am again. I hope someone will have mercy on me. I am adding a search function to Fonty Python and this starts with chopping- up a string into tokens. I have a simple set of rules: phrase1 phrase2 field: value1 value2 "value three" field2: value1 etc Any word on it's own if it's NOT after a fieldname is a phrase alone. Anything after a fieldname is a sub-phrase of that field, unless it's another fieldname. I am just not getting anything like what I am looking for. \d My hacking and surfing and re-hacking have ended here: import pyparsing as PP #testing with vers 1.4.8 and 1.5.0 simple = PP.Word(PP.alphas) quoted = PP.dblQuotedString.setParseAction(PP.removeQuotes) single = PP.OneOrMore(simple | quoted) special = PP.Combine(PP.Word(PP.alphas) + ":") + single #term = special | single query = PP.OneOrMore( special | single )# + PP.StringEnd() <- buggy tests=[ u"WTF Huh aField: aValue", # want: [u'WTF', u'Huh', {u'aField:':[u'aValue']}] u'aField : "aValue blah" bloop AnotherField: two', # want: [{u'aField:':[u'aValue blah',u'bloop']},{u'AnotherField:':u'two'}] u"aField: Someval AnotherField: two" # want : [{u'aField:':[u'SomeVal']},{u'AnotherField:':u'two'}] ] for test in tests: print test try: tokens=query.parseString(test) print tokens except: print "BUG" |
From: Gustavo N. <me...@gu...> - 2009-05-15 14:37:45
|
Bonjour, Denis ! spir said: > Had a look and find it really interesting (booleano). > Reminds me of a project about customizing computer languages (PL, wiki, > etc), including allowing various natural languages. This should be a kind > of layer (possibly implemented in an editor) between the user and the > standard computer language. The main issue was that key words (not > necessarily reserved words) may well be free words for another user/natural > language. > > E.g. in your example > - Castilian: autor == "David TMX" y álbum.año >= 2008 > - English: author == "David TMX" and album.year >= 2008 > - French: auteur == «David TMX» et album.année >= 2008 > what happens if logical ('and' '==' '>=', probably 'not' 'or'), or maybe > even key ('author' 'album' 'year'), tokens are used with another sense or > context in another user's dialect? Do you need to protect all possible > variants of words having a special meaning in your language? (Even if it > was possible, then user-level choices are impossible). Developers using Booleano should define the variable and function names beforehand; users cannot define variables or functions, just re-use those provided by the application. So, when the developer passes the variables and functions valid in the expressions, Booleano checks that their names aren't reserved words in the grammar (and variable, function and operator names are all case-insensitive). Also, there won't be just one grammar to parse all the expressions. There will be one grammar per localization, so you could only use the French grammar to parse French expressions (not English or Spanish expressions). This way, name collisions are avoided. > > > Buena suerte, y mucho gusto! > > > > ¡Lo mismo digo! ;-) > > > > Thank you! =) > > Bona sort, i molt plaer! (català) OK, now I ran out of ideas 'cause I don't know how to say so in another language ;-) Salut ! -- Gustavo Narea <xri://=Gustavo>. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | |
From: spir <den...@fr...> - 2009-05-15 12:49:41
|
Le Thu, 14 May 2009 21:53:40 +0200, Gustavo Narea <me...@gu...> s'exprima ainsi: > I'm working on a package called PyACL, which as the name implies, > implements Access Control Lists in Python (and repoze.what 2 will use it a > lot). But one of the things that I was missing was the way to allow system > administrators to filter the access rules easily, so I started working on > this generic Pyparsing-based library which I'll announce here as soon as > it's usable: https://launchpad.net/booleano Had a look and find it really interesting (booleano). Reminds me of a project about customizing computer languages (PL, wiki, etc), including allowing various natural languages. This should be a kind of layer (possibly implemented in an editor) between the user and the standard computer language. The main issue was that key words (not necessarily reserved words) may well be free words for another user/natural language. E.g. in your example - Castilian: autor == "David TMX" y álbum.año >= 2008 - English: author == "David TMX" and album.year >= 2008 - French: auteur == «David TMX» et album.année >= 2008 what happens if logical ('and' '==' '>=', probably 'not' 'or'), or maybe even key ('author' 'album' 'year'), tokens are used with another sense or context in another user's dialect? Do you need to protect all possible variants of words having a special meaning in your language? (Even if it was possible, then user-level choices are impossible). > > Buena suerte, y mucho gusto! > > ¡Lo mismo digo! ;-) > > Thank you! =) Bona sort, i molt plaer! (català) Denis ------ la vita e estrany |
From: Gustavo N. <me...@gu...> - 2009-05-14 23:02:32
|
Yes, it worked that way! Thank you very much once again, Denis and Paul! :) - Gustavo. Paul said: > Yes, Denis has another approach. You can use parse actions as a way to add > validation logic like Denis suggests. Here is a validating parse action > that you could attach to variable to ensure that it contains at least one > non-digit. > > def mustHaveAtLeastOneNonDigit(tokens): > if all(c.isdigit() for c in tokens[0]): > raise ParseException("variable must have at least one non-digit") > variable.setParseAction(mustHaveAtLeastOneNonDigit) > > > Parse actions can serve a number of uses. They can also be chained so that > multiple actions or validations can be invoked: > > > ip_part = Word(nums) > convertToInt = lambda tokens: int(tokens[0]) > def validateRange(tokens): > if not 0 <= tokens[0] < 256: > raise ParseException("value must be in range 0-255") > ip_part.setParseAction(convertToInt, validateRange) > # or: > # ip_part.setParseAction(convertToInt) > # ip_part.addParseAction(validateRange) > ip_addr = Combine(ip_part + ('.'+ip_part)*3 ) > > print ip_addr.parseString("192.168.0.255") > print ip_addr.parseString("123.456.789.000") -- Gustavo Narea <xri://=Gustavo>. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | |
From: Gustavo N. <me...@gu...> - 2009-05-14 22:30:50
|
Paul said: > Ah, with that said, let me then suggest this as a starting point for > implementing operand (assuming that variables can not start with a digit): > > LBRACE = Suppress('{') > RBRACE = Suppress('}') > operand = Forward() > number = #...(as you have defined it in your original code) > string_ = quotedString.setParseAction(removeQuotes) > variable = #...(use your Unicode definition of choice) > set_ = Group(LBRACE + delimitedList(operand) + RBRACE) > operand << (number | string_ | variable | set_) > > delimitedList takes care of the repetition with intervening comma > delimited. Group packages the result in its own list, so that recursive set > definitions will maintain their nesting properly. The set-enclosing braces > are suppressed from the output - they are useful during parsing, but > unnecessary once the tokens have been grouped. > > This is the canonical form for defining a recursive expression like your > operand, mas o menos. Now you are free to include operand in other > expressions, or even operatorPrecedence. Thank you very much for that! I was going to talk about that, because the way I implemented it was more complex and operand.validate() raised an exception. But this fixes the problem. Thanks once again! -- Gustavo Narea <xri://=Gustavo>. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | |
From: Paul M. <pt...@au...> - 2009-05-14 20:18:11
|
> I said that an operand could be a variable or a number to simplify things > given that the problem was between numbers and variables. But it's > actually > more complex than that: It could be a quoted string or a set (in the form > "{element1, element2, ...}" where each element can be a number, variable, > quoted string or even another set) too: > > operand = number | string | variable | set > Ah, with that said, let me then suggest this as a starting point for implementing operand (assuming that variables can *not* start with a digit): LBRACE = Suppress('{') RBRACE = Suppress('}') operand = Forward() number = #...(as you have defined it in your original code) string_ = quotedString.setParseAction(removeQuotes) variable = #...(use your Unicode definition of choice) set_ = Group(LBRACE + delimitedList(operand) + RBRACE) operand << (number | string_ | variable | set_) delimitedList takes care of the repetition with intervening comma delimited. Group packages the result in its own list, so that recursive set definitions will maintain their nesting properly. The set-enclosing braces are suppressed from the output - they are useful during parsing, but unnecessary once the tokens have been grouped. This is the canonical form for defining a recursive expression like your operand, mas o menos. Now you are free to include operand in other expressions, or even operatorPrecedence. -- Paul |
From: Gustavo N. <me...@gu...> - 2009-05-14 19:54:00
|
Paul said: > > Hello, everybody. > > > > First of all, I wanted to thank you for this awesome package. I'm having > > fun with it. :) > > Well, well, my friend, so we meet again! I'm pleased to see you have been > bitten by the pyparsing bug. :) Hello, Paul! Good to see you here :) > > How can I fix this? > > In general, I think this is why variable names in most computing languages > I know do *not* permit the name to begin with a number. But you are the > language designer, so I will show you how to do this in pyparsing. > > Two suggestions, not sure if I have a preference: > 1. use "operand = number ^ variable" instead of "operand = number | > variable". '|' returns MatchFirst expressions, which return, well, the > first matching expression. '^' returns Or expressions, which return > *longest* match of all the alternative expressions. Think of the '^' as a > little set of dividers, measuring the returned values of all the > expressions, and picking the longest. '^' is not a cure-all, though, and > can cause infinite run-time recursion in self-referencing grammars (those > that include operatorPrecedence or Forward expressions). I use both operatorPrecedence and Forward :/ > 2. As you say, invert operand to "operand = variable | number", and then > attach a parse action to variable that first tries to evaluate the result > as a number. In your current parser, you may eventually attach a parse > action to number, something like this: > number.setParseAction(lambda tokens: float(tokens[0])) > so that at post-parse time, the returned string has already been converted > to a float. So instead, attaching something like this to variable > (untested): > > def numOrVar(tokens): > try: > return float(tokens[0]) > except ValueError: > pass > variable.setParseAction(numOrVar) > > Now you don't even need the alternation, since as you observed, variable > will also match "22", so just define "operand = variable". > You could also try this for defining variable: > > variable = Word(unicode(alphanums+'_')) > > or > > variable = Word(unicode(alphanums+alphas8bit+'_')) > > or to absolutely cover all bases (for 2-byte Unicode, anyway): > > allUnicodeAlphas = u''.join(c for c in map(unichr,range(65536)) if > c.isalpha()) > allUnicodeNums = = u''.join(c for c in map(unichr,range(65536)) if > c.isdigit()) > variable = Word(allUnicodeAlphas + allUnicodeNums + u'_') > > (It's surprising how many Unicode digits there are besides '0'-'9'.) I said that an operand could be a variable or a number to simplify things given that the problem was between numbers and variables. But it's actually more complex than that: It could be a quoted string or a set (in the form "{element1, element2, ...}" where each element can be a number, variable, quoted string or even another set) too: operand = number | string | variable | set Therefore setting a parse action for the whole operand wouldn't be desirable, I'd rather set it in the types individually -- specially to be able to test them separately too. Sorry for not pointing this out. > > > BTW, this definition of decimals: > > decimals = Optional(decimal_sep + OneOrMore(Word(nums))) > > includes some unnecessary repetition. It should be sufficient to write: > > decimals = Optional(decimal_sep + Word(nums)) > > Unless I misunderstood your intent here. Thank you so much! I thought I had to set the quantifier explicitly. > So, will we see some pyparsing sneak into a repoze package one of these > days, perhaps some sort of authorization rights syntax, hmmm? You guessed right! :) I'm working on a package called PyACL, which as the name implies, implements Access Control Lists in Python (and repoze.what 2 will use it a lot). But one of the things that I was missing was the way to allow system administrators to filter the access rules easily, so I started working on this generic Pyparsing-based library which I'll announce here as soon as it's usable: https://launchpad.net/booleano > Buena suerte, y mucho gusto! ¡Lo mismo digo! ;-) Thank you! =) -- Gustavo Narea <xri://=Gustavo>. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | |
From: Gustavo N. <me...@gu...> - 2009-05-14 18:04:16
|
Bonjour, Denis. spir said: > The issue is that your variables can start like a number (the reason why in > most PLs var names cannot start with a digit). So that: * using (number | > variable) number masks variable > * using the opposite number is eaten by variable > > You should use the common pattern for a variable, requiring letter or '_' > at start: variable = Regex("[a-zA-Z_]\w*", re.UNICODE) (untested with > unicode) Yes, I was aware of that limitation with regular expressions, but I thought there was a way to work around that with Pyparsing. >(also beware that \w includes digits and '_') Oops, you're right. Thanks! > so that number and variable are mutually exclusive. > > You could also add a lookahead for !(letter | '_') trailing after the > definition of number, but then your definition of variable is unclear. It > should be required that a variable has at least one non-digit char. Which > is uneasy ;-) Yep, that's not a good solution either. Well, I'll have to make sure variables don't start with a number. :/ Merci beaucoup! -- Gustavo Narea <xri://=Gustavo>. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | |
From: Paul M. <pt...@au...> - 2009-05-14 17:57:16
|
> -----Original Message----- > From: spir [mailto:den...@fr...] > You could also add a lookahead for !(letter | '_') trailing after the > definition of number, but then your definition of variable is unclear. It > should be required that a variable has at least one non-digit char. Which > is uneasy ;-) > Yes, Denis has another approach. You can use parse actions as a way to add validation logic like Denis suggests. Here is a validating parse action that you could attach to variable to ensure that it contains at least one non-digit. def mustHaveAtLeastOneNonDigit(tokens): if all(c.isdigit() for c in tokens[0]): raise ParseException("variable must have at least one non-digit") variable.setParseAction(mustHaveAtLeastOneNonDigit) Parse actions can serve a number of uses. They can also be chained so that multiple actions or validations can be invoked: ip_part = Word(nums) convertToInt = lambda tokens: int(tokens[0]) def validateRange(tokens): if not 0 <= tokens[0] < 256: raise ParseException("value must be in range 0-255") ip_part.setParseAction(convertToInt, validateRange) # or: # ip_part.setParseAction(convertToInt) # ip_part.addParseAction(validateRange) ip_addr = Combine(ip_part + ('.'+ip_part)*3 ) print ip_addr.parseString("192.168.0.255") print ip_addr.parseString("123.456.789.000") -- Paul |
From: Paul M. <pt...@au...> - 2009-05-14 17:47:33
|
> -----Original Message----- > From: Gustavo Narea [mailto:me...@gu...] > Sent: Thursday, May 14, 2009 12:02 PM > To: pyp...@li... > Subject: [Pyparsing] How to distinguish a variable from a integer > > Hello, everybody. > > First of all, I wanted to thank you for this awesome package. I'm having > fun with it. :) Well, well, my friend, so we meet again! I'm pleased to see you have been bitten by the pyparsing bug. :) > How can I fix this? > In general, I think this is why variable names in most computing languages I know do *not* permit the name to begin with a number. But you are the language designer, so I will show you how to do this in pyparsing. Two suggestions, not sure if I have a preference: 1. use "operand = number ^ variable" instead of "operand = number | variable". '|' returns MatchFirst expressions, which return, well, the first matching expression. '^' returns Or expressions, which return *longest* match of all the alternative expressions. Think of the '^' as a little set of dividers, measuring the returned values of all the expressions, and picking the longest. '^' is not a cure-all, though, and can cause infinite run-time recursion in self-referencing grammars (those that include operatorPrecedence or Forward expressions). 2. As you say, invert operand to "operand = variable | number", and then attach a parse action to variable that first tries to evaluate the result as a number. In your current parser, you may eventually attach a parse action to number, something like this: number.setParseAction(lambda tokens: float(tokens[0])) so that at post-parse time, the returned string has already been converted to a float. So instead, attaching something like this to variable (untested): def numOrVar(tokens): try: return float(tokens[0]) except ValueError: pass variable.setParseAction(numOrVar) Now you don't even need the alternation, since as you observed, variable will also match "22", so just define "operand = variable". You could also try this for defining variable: variable = Word(unicode(alphanums+'_')) or variable = Word(unicode(alphanums+alphas8bit+'_')) or to absolutely cover all bases (for 2-byte Unicode, anyway): allUnicodeAlphas = u''.join(c for c in map(unichr,range(65536)) if c.isalpha()) allUnicodeNums = = u''.join(c for c in map(unichr,range(65536)) if c.isdigit()) variable = Word(allUnicodeAlphas + allUnicodeNums + u'_') (It's surprising how many Unicode digits there are besides '0'-'9'.) BTW, this definition of decimals: decimals = Optional(decimal_sep + OneOrMore(Word(nums))) includes some unnecessary repetition. It should be sufficient to write: decimals = Optional(decimal_sep + Word(nums)) Unless I misunderstood your intent here. So, will we see some pyparsing sneak into a repoze package one of these days, perhaps some sort of authorization rights syntax, hmmm? Buena suerte, y mucho gusto! -- Paul |
From: spir <den...@fr...> - 2009-05-14 17:22:08
|
Le Thu, 14 May 2009 19:01:42 +0200, Gustavo Narea <me...@gu...> s'exprima ainsi: > Hello, everybody. > > First of all, I wanted to thank you for this awesome package. I'm having > fun with it. :) > > I've read O'reilly's shortcut on Pyparsing, but still I can't find an > answer to this: > > One of the components of the grammar I'm defining is an operand. Operands > can be a number or a variable. A variable is a string made up of word > characters (in any language), numbers (in any language/culture) and/or a > spacing character (underscores by default). > > I'm using the following: > """ > import re > from pyparsing import * > > # Defining the numbers: > decimal_sep = Literal(".") > decimals = Optional(decimal_sep + OneOrMore(Word(nums))) > number = Combine(Word(nums) + decimals) > > # Defining the variables: > variable = Regex("[\w\d_]+", re.UNICODE) > > # Finally, let's define the operand: > operand = number | variable > """ > > The operand above works perfectly with the following expressions: > hello -> variable > 23 -> number > hello_world -> variable > > But it doesn't support variables which begin with a number (e.g., > "1st_variable"). I get the following exception all the time: > >>>> from varnums import * > >>>> operand.parseString("1st_variable") > >(['1'], {}) > >>>> operand.parseString("1st_variable", parseAll=True) > >Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File > > "/home/gustavo/System/pyenvs/booleano/lib/python2.6/site-packages/pyparsing > >-1.5.2-py2.6.egg/pyparsing.py", line 1076, in parseString raise exc > >pyparsing.ParseException: Expected end of text (at char 1), (line:1, col:2) > > > I know I can invert the definition of the operand (i.e., "operand = > variable | number"), but then strings like "22" will be matched as > variables (not numbers). > > How can I fix this? > > Thanks in advance. The issue is that your variables can start like a number (the reason why in most PLs var names cannot start with a digit). So that: * using (number | variable) number masks variable * using the opposite number is eaten by variable You should use the common pattern for a variable, requiring letter or '_' at start: variable = Regex("[a-zA-Z_]\w*", re.UNICODE) (untested with unicode) (also beware that \w includes digits and '_') so that number and variable are mutually exclusive. You could also add a lookahead for !(letter | '_') trailing after the definition of number, but then your definition of variable is unclear. It should be required that a variable has at least one non-digit char. Which is uneasy ;-) Denis ------ la vita e estrany |
From: Gustavo N. <me...@gu...> - 2009-05-14 17:02:03
|
Hello, everybody. First of all, I wanted to thank you for this awesome package. I'm having fun with it. :) I've read O'reilly's shortcut on Pyparsing, but still I can't find an answer to this: One of the components of the grammar I'm defining is an operand. Operands can be a number or a variable. A variable is a string made up of word characters (in any language), numbers (in any language/culture) and/or a spacing character (underscores by default). I'm using the following: """ import re from pyparsing import * # Defining the numbers: decimal_sep = Literal(".") decimals = Optional(decimal_sep + OneOrMore(Word(nums))) number = Combine(Word(nums) + decimals) # Defining the variables: variable = Regex("[\w\d_]+", re.UNICODE) # Finally, let's define the operand: operand = number | variable """ The operand above works perfectly with the following expressions: hello -> variable 23 -> number hello_world -> variable But it doesn't support variables which begin with a number (e.g., "1st_variable"). I get the following exception all the time: >>>> from varnums import * >>>> operand.parseString("1st_variable") >(['1'], {}) >>>> operand.parseString("1st_variable", parseAll=True) >Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File > "/home/gustavo/System/pyenvs/booleano/lib/python2.6/site-packages/pyparsing >-1.5.2-py2.6.egg/pyparsing.py", line 1076, in parseString raise exc >pyparsing.ParseException: Expected end of text (at char 1), (line:1, col:2) I know I can invert the definition of the operand (i.e., "operand = variable | number"), but then strings like "22" will be matched as variables (not numbers). How can I fix this? Thanks in advance. -- Gustavo Narea <xri://=Gustavo>. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | |
From: Paul M. <pt...@au...> - 2009-02-26 14:58:28
|
You should start by trying to tune up your definition of term, since this expression gets used a *lot* internally to the operatorPrecedence code. Here are some comments/questions/suggestions on cleaning up term: 1. term includes 2 references to numericRange, why? 2. variable() returns an expression that is a MatchFirst of 10 different characters. Just have this method return Regex("[0-9#]"), it will evaluate much faster. 3. numericRange tests for signedNumbers, and then for unsigned numbers. But your unsigned numbers would match the signedNumber expression, so the second alternative test will never match. Also, signedNumber the way you have defined it would match a single "-" character, probably not desired. Try this: signedNumber = Optional('-') + Word(nums) # or even just Regex(r"-?\d+") numericRange = ( (lbrack + Literal('#') + (signedNumber | '*').setResultsName('min') + Suppress(':') + (signedNumber | '*').setResultsName('max') + rbrack) ) (I also removed the Combine - you might want Group instead.) 4. I streamlined repetition a bit from: repetition = ( ( plus + lbrace + Word(nums).setResultsName("count") + rbrace ) | ( plus + lbrace + Word(nums).setResultsName("minCount")+","+ Word(nums).setResultsName("maxCount") + rbrace ) | plus ) to: repetition = plus + Optional( lbrace + ( ( Word(nums).setResultsName("minCount")+","+ Word(nums).setResultsName("maxCount") ) | Word(nums).setResultsName("count") ) + rbrace ) Which could look a little nicer as: repetition = plus + Optional( lbrace + ( ( Word(nums)("minCount")+","+ Word(nums)("maxCount") ) | Word(nums)("count") ) + rbrace ) (runs no faster, but I find it a little easier to read). Since repetition is the first precedence level, it gets used a lot, so any streamlining here helps. 5. space = OneOrMore(White()) Really? I doubt you are matching any these at the moment, since you aren't taking any steps to disable pyparsing's default behavior of skipping whitespace. But as the first alternative in the list of expressions in term, you are testing for it *many* times. 6. You might be able to reorder the options in term based on the likelihood of occurrence in the input text. Since this is a MatchFirst, testing for more common options ahead of rarer ones will shortcut the rest of the tests, with a performance win. You might also define: integer = Word(nums) And then use integer in all your related expressions, instead of repeating Word(nums) all the time - this will make your code a little easier to read, and the packratting will be a little more efficient, too. I also have some comments on operatorPrecedence itself, but I'll wait until you have gotten term to run a bit better before delving into oP. Just one note, instead of (in your list of precedence definitions): (Empty(), 2, opAssoc.LEFT, self.handleSequence), Try: (None, 2, opAssoc.LEFT, self.handleSequence), -- Paul |
From: spir <den...@fr...> - 2009-02-26 07:50:12
|
Le Wed, 25 Feb 2009 23:14:44 -0700, Andrew Warkentin <and...@gm...> s'exprima ainsi: > I am writing a proxy (http://xuproxy.sourceforge.net/) that includes > header and content filters that use a regexp-like matching language > (basically a hybrid of globs and regexps). The parser for this matching > language is implemented using pyparsing (the actual matching code does > not use pyparsing, only the parser for the patterns themselves). > Currently, it uses the operatorPrecedence function (actually, a custom > version of it, modified to deal with certain features of the matching > language). There are two problems with this, though. The first is that > it is slow (up to a second or more on complex patterns), even with > packrat parsing enabled. The second is that very complex patterns cause > the maximum recursion depth to be exceeded. Is there a way to parse a > regexp-like language with pyparsing that is faster and involves less > recursion than operatorPrecedence? Maybe it is a stupid -- anyway ;-) Have you considered simply letting down operator precedence for your language? In favor of sequential semantics + (gouping). I had a similar problem and this worked fine for me (actually I just did not wish to cope with operator precedence, I had no perfomance issue), because: * Many expressions did not involve operators of different precedence level * Many where logically commutative, meaning one could rewrite "a | b c" to "b c | a" when "b c" cannot be a prefix of "a" * I often simply wrote intermediate (sub) expressions so that composite ones hold higher level non-terminals in which there is no more precedence: p = a b | c d ==> p1 = a b p2 = c d p = p1 | p2 * In other cases writing () was not such a burden. I may be wrong but I think operator precedence is inherently time-consuming precisely because it involves recursion. As recursion cannot be done at the same location in the input string (otherwise infinite loop -- the reason why left recursion is not possible), then packrat memoïzing will not help with this one aspect of the issue. As I understand it, it can help only in such case: a b | a c Then when b fails, a will not be evaluated twice. (Please tell me if I'm wrong.) Denis ------ la vita e estrany |
From: Andrew W. <and...@gm...> - 2009-02-26 06:12:42
|
I am writing a proxy (http://xuproxy.sourceforge.net/) that includes header and content filters that use a regexp-like matching language (basically a hybrid of globs and regexps). The parser for this matching language is implemented using pyparsing (the actual matching code does not use pyparsing, only the parser for the patterns themselves). Currently, it uses the operatorPrecedence function (actually, a custom version of it, modified to deal with certain features of the matching language). There are two problems with this, though. The first is that it is slow (up to a second or more on complex patterns), even with packrat parsing enabled. The second is that very complex patterns cause the maximum recursion depth to be exceeded. Is there a way to parse a regexp-like language with pyparsing that is faster and involves less recursion than operatorPrecedence? The source for the parser module can be seen at http://freehg.org/u/andreww591/xuproxy/file/tip/lib/xuproxy/proxo_filter/matching_parser.py |
From: Gre7g L. <haf...@ya...> - 2009-01-16 00:33:59
|
From: dikshie <di...@gm...> To: pyp...@li... Sent: Thursday, January 15, 2009 4:23:44 PM Subject: [Pyparsing] ip address > in_file = open(sys.argv[1], "r") > integer = Word( nums ) > ipAddress = delimitedList( integer, ".", combine=True ) This will get any number of integers... even one. You want four specifically. > ip = ipAddress.parseString(line) line? What's line? You're getting an un-helpful error here because you did a "from pyparsing import *". That's a bad plan. Never do an "import *". > print ip Try this instead: in_file = open(sys.argv[1], "r").read() integer = Word( nums ) ipAddress = Combine(integer + "." + integer + "." + integer + "." + integer) ip = ipAddress.searchString(in_file) print ip Gre7g |
From: dikshie <di...@gm...> - 2009-01-15 23:23:48
|
hi, how to parse ipaddress from "irregular"log file? for example my log file might be start with '***************' or '***********start************' or '00000000000000' and then follof by ip address. i tried to use: in_file = open(sys.argv[1], "r") integer = Word( nums ) ipAddress = delimitedList( integer, ".", combine=True ) ip = ipAddress.parseString(line) print ip but fail regards, -- -dikshie- |
From: Orestis M. <or...@or...> - 2008-12-14 15:27:35
|
Hello, I'm trying to use pyparsing to do syntax highlighting for a toy editor I'm writing. I'm trying to duplicate Vim's syntax which is simple enough not to be mind-boggling, but complicated enough to be interesting. Here's one example: syn keyword pythonStatement def class nextgroup=pythonFunction skipwhite syn match pythonFunction "[a-zA-Z_][a-zA-Z0-9_]*" contained This says that 'def' and 'class' are keywords belonging to the group 'pythonStatement'. When they are encountered, you should first try the 'pythonFunction' match (before trying everything else). The 'pythonFunction' isn't ever tried standalone because it's 'contained'. I ignore 'skipwhite' for now. I've tried to duplicate this by: def pos(s, loc, tokens): actual_loc = s.index(tokens[0], loc) return (actual_loc, len(tokens[0]) + actual_loc, tokens) def contains(expr): expr.setParseAction(pos) def parse_contained_expr(s, loc, tok): substr = s[loc+len(tok[0]):] print expr.parseString(substr) # change the tokens list here. return parse_contained_expr pythonFunction = Regex("[a-zA-Z_][a-zA-Z0-9_]*") + WordEnd(alphanums + '_').suppress() def_class = set('def class'.split()) def_class = Or(map(Keyword, def_class)).setParseAction(contains(pythonFunction), pos) test = """\ def something(): pass """ print def_class.parseString(test) produces: [(1, 10, (['something'], {}))] [(0, 3, (['def'], {}))] I have the following problems with this approach: 1) I have to make 'match' elements to be single-lines only. I want to handle lines by myself. I tried doing 'SkipTo(LineEnd(), include=False)' but it still goes to the next line, and it also gives me the '():' 2) I want to keep a track of the locations of the tokens. Effectively, at the end of the parsing I would like to have a marked-up version of the input string, with the types of the tokens. My loc function is an attempt at that, but it doesn't have any global state, so the second location (of the contained expression) is based on the chopped input string. 3) It seems very complicated and fiddly, I'm sure I'm doing something wrong. I wonder if I need another level of abstraction for this (like a scanner object that keeps track of the locations, keeping the parse actions simple) or another approach. Adding another level of containment seems a nightmare. Many thanks for your help! Orestis -- or...@or... http://orestis.gr/ |
From: Paul M. <pt...@au...> - 2008-12-03 11:59:47
|
No, there is no limit that would cause pyparsing to stop at line 2 character 15. Are you using searchString or scanString to read through the source text? Maybe if you posted a bit more code, it would help to understand what your program is doing. -- Paul -----Original Message----- From: Boštjan Jerko [mailto:ml...@ja...] Sent: Wednesday, December 03, 2008 1:14 AM To: pyp...@li... Subject: [Pyparsing] parsing page Hello! I'd like to parse a page by searching if it contains searched word and after that I have a known syntax. Let me explain: <a lot of text with unknown length with possible line breaks> <searched word> <known syntax to parse> ..... If I try to use Word(alphas), it stops at line 2 charachter 15 (is there a limit)? Should I just use pythons index command? Hope I was clear with the question/explanation. Boštjan ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |