pyparsing-users Mailing List for Python parsing module (Page 27)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Jean-Paul C. <ex...@di...> - 2007-05-25 13:41:58
|
On Fri, 25 May 2007 01:20:14 -0500, Paul McGuire <pa...@al...> wrote: > [snip] > >Please don't focus on the fact that [] is implemented using __getitem__, or >that [] implies that we are "getting" anything - does this notation seem >like a reasonable shortcut, in place of the following? > >stats = "AVE =" + realNum.setResultsName("average") + "STD. DEV. =" + >realNum.setResultsName("stdDevn") + \ > "MIN =" + realNum.setResultsName("min") + "MAX =" + >realNum.setResultsName("max") > >Or, if you really want to think of this as a "getting" kind of operation, >you could interpret this notation as indicating that realNum["average"] is >getting for us a special form of realNum that names its returned tokens >"average". For that matter, it also mirrors the dict-style format for >retrieval of the data: > >results = stats.parseString( inputData ) >print results["average"] > The advantage of setResultsName over __getitem__ is what it suggests to a reader solely through its name. Throwing that away makes result naming a more expensive feature, since it becomes exceedingly likely to confuse a reader who isn't already familiar with the API. Perhaps worse, it's much more difficult to search for the API documentation for __getitem__ than it is to search for the API documentation for setResultsName. The suggested `_' method is just as bad, as far as obviousness goes, but at least it wins out in that searching for its documentation is easier. Ultimately, setResultsName doesn't bother me at all. Typing out identifiers isn't anywhere near the biggest expenditure of my time. I'd much rather have a clearly named method than an overloaded operator. Jean-Paul |
From: Ralph C. <ra...@in...> - 2007-05-25 11:12:13
|
Hi Paul, > I don't want this to be part of the constructor. OK, I see your point. > Please don't focus on the fact that [] is implemented using > __getitem__, or that [] implies that we are "getting" anything I was just trying to point out that overloading should try and stick to similar meanings. People used to reading Python have a brain that's hard wired to think [] is indexing for a read, just like they think + is for adding, focused on it or not. What did you think of the realNum._('min') idea? Only two more characters and what it does looks like how it reads, i.e. it's calling an attribute function. I can read the bracket notation, I just think it would muddy the Pythoness of PyParsing. Cheers, Ralph. |
From: Paul M. <pa...@al...> - 2007-05-25 06:20:20
|
I don't want this to be part of the constructor. The point of setResultsName is to take a generic pattern (something like integer), and use it in several different places in the grammar, and the tokens returned from each place have a different name. So a better example than my personal info example might be a parser for a statistical summary report: realNum = Combine(Optional("-") + Word(nums)+ "." + Word(nums)) stats = "AVE =" + realNum["average"] + "STD. DEV. =" + realNum["stdDevn"] + \ "MIN =" + realNum["min"] + "MAX =" + realNum["max"] Please don't focus on the fact that [] is implemented using __getitem__, or that [] implies that we are "getting" anything - does this notation seem like a reasonable shortcut, in place of the following? stats = "AVE =" + realNum.setResultsName("average") + "STD. DEV. =" + realNum.setResultsName("stdDevn") + \ "MIN =" + realNum.setResultsName("min") + "MAX =" + realNum.setResultsName("max") Or, if you really want to think of this as a "getting" kind of operation, you could interpret this notation as indicating that realNum["average"] is getting for us a special form of realNum that names its returned tokens "average". For that matter, it also mirrors the dict-style format for retrieval of the data: results = stats.parseString( inputData ) print results["average"] -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Corrin Lakeland Sent: Thursday, May 24, 2007 6:45 PM To: pyp...@li... Subject: Re: [Pyparsing] Proposed notational shortcut for setResultsName A tiny modification of Ralph's suggestion makes the most sense to me userdata = Word(alphas, res = "name") + Word(nums+"-", res = "socsecno") -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Ralph Corderoy Sent: Friday, May 25, 2007 11:39 AM To: Paul McGuire Cc: pyp...@li... Subject: Re: [Pyparsing] Proposed notational shortcut for setResultsName Hi Paul, > So how about adding a shortcut for setResultsName, using getitem? With > this short cut, this code: > > userdata = Word(alphas).setResultsName("name") + > Word(nums+"-").setResultsName("socsecno") > > could be written as: > > userdata = Word(alphas)["name"] + Word(nums+"-")["socsecno"] It just seems odd to use getitem to "set" something, i.e. the results name. What about having "_" as an attribute function instead so it isn't too obtrusive? userdata = Word(alphas)._("name") + Word(nums+"-")._("socsecno") Or userdata = Word(alphas, n = "name") + Word(nums+"-", n = "socsecno") Cheers, Ralph. ------------------------------------------------------------------------ - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Corrin L. <Cor...@da...> - 2007-05-24 23:44:52
|
A tiny modification of Ralph's suggestion makes the most sense to me =20 userdata =3D Word(alphas, res =3D "name") + Word(nums+"-", res =3D "socsecno") -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Ralph Corderoy Sent: Friday, May 25, 2007 11:39 AM To: Paul McGuire Cc: pyp...@li... Subject: Re: [Pyparsing] Proposed notational shortcut for setResultsName Hi Paul, > So how about adding a shortcut for setResultsName, using getitem? With > this short cut, this code: >=20 > userdata =3D Word(alphas).setResultsName("name") + > Word(nums+"-").setResultsName("socsecno") >=20 > could be written as: >=20 > userdata =3D Word(alphas)["name"] + Word(nums+"-")["socsecno"] It just seems odd to use getitem to "set" something, i.e. the results name. What about having "_" as an attribute function instead so it isn't too obtrusive? userdata =3D Word(alphas)._("name") + Word(nums+"-")._("socsecno") Or userdata =3D Word(alphas, n =3D "name") + Word(nums+"-", n =3D = "socsecno") Cheers, Ralph. ------------------------------------------------------------------------ - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Ralph C. <ra...@in...> - 2007-05-24 23:38:55
|
Hi Paul, > So how about adding a shortcut for setResultsName, using getitem? With this > short cut, this code: > > userdata = Word(alphas).setResultsName("name") + > Word(nums+"-").setResultsName("socsecno") > > could be written as: > > userdata = Word(alphas)["name"] + Word(nums+"-")["socsecno"] It just seems odd to use getitem to "set" something, i.e. the results name. What about having "_" as an attribute function instead so it isn't too obtrusive? userdata = Word(alphas)._("name") + Word(nums+"-")._("socsecno") Or userdata = Word(alphas, n = "name") + Word(nums+"-", n = "socsecno") Cheers, Ralph. |
From: Paul M. <pa...@al...> - 2007-05-24 15:59:11
|
After the posts on c.l.py from Paul Boddie and Steven Bethard, it got me to thinking about how to supply a notational shortcut to make setResultsName'ing easier. I've added this post to the pyparsing wiki Home page Discussion (http://pyparsing.wikispaces.com/message/view/home/606302): ------------ I got to thinking after some postings on comp.lang.python about adding a notational short cut for setResultsName. I really want to encourage people to use named elements in their grammar, but that method name is just long and ugly, and grammar-cluttering. So how about adding a shortcut for setResultsName, using getitem? With this short cut, this code: userdata = Word(alphas).setResultsName("name") + Word(nums+"-").setResultsName("socsecno") could be written as: userdata = Word(alphas)["name"] + Word(nums+"-")["socsecno"] Any comments? Alternatives? ------------ I'm inclined to make this change, but I'd like a little feedback from those who are using pyparsing. I've not gotten any replies yet, so I'm trying the other pyparsing communication channels (short of intruding on c.l.py). If you have any comments, please reply here, or add them to the Wiki discussion. Thanks! -- Paul |
From: Corrin L. <Cor...@da...> - 2007-03-22 19:58:01
|
Slightly non-scientific since I didn't adjust for varying loads on the machine or disk caches, but running with and without sort both gave 85.6 processing time. Looks like the key was choosing to use the re module, and sorting didn't help. Interesting :) Corrin PS: matching the building name using ~(UNIT_TYPE) didn't work... Since that didn't give me the option to go .setResultsName. What I've got at the moment is: BUILDING =3D OneOrMore(Word(nameletters)).setParseAction(rejectBuildingName).setResul tsName("BuildingName") And the parse action is: def rejectBuildingName(string,loc,tokens):=20 """ Prevent building name of LWR GROUND and similar """ building_name =3D "" for token in tokens: if token =3D=3D self.SEPCHAR: if debug: print "Rejected the building name " + building_name raise ParseException(string,loc,"found a field seperator in the building") if building_name <> "": building_name +=3D " " building_name +=3D token if self.debug: print "Trying to reject the building name %s"%(building_name) if spare_parser <> None: r =3D None try: r =3D spare_parser.NOT_A_BUILDING.parseString(building_name) except ParseException, pe: r =3D r if r =3D=3D None: if debug: print "Looks like this building is not a floor or a unit" else: if debug: print "Rejected as this building looks like a floor or a unit" raise ParseException(string,loc,"Rejected %s as a building name - looks like a floor or a unit" % string) if debug: print "Looks like this building is okay" NOT_A_BUILDING =3D (UNIT_TYPE | FLOOR | BOX_LINE | BAG_LINE) It feels like a very roundabout way of doing it to me, though it seems to work well enough. -----Original Message----- From: Ralph Corderoy [mailto:ra...@in...]=20 Sent: Thursday, March 22, 2007 12:11 AM To: Corrin Lakeland Cc: pyp...@li... Subject: Re: [Pyparsing] Efficency of Keyword (and a couple other bits)=20 Hi Corrin, I'm glad you've got the speed-up you were after. Out of interest, how does the Regexp with the alternatives sorted by frequency compare with the Regexp with the alternatives sorted by reverse frequency? This would show if the re module is optimising without you needing to sort. Cheers, Ralph. |
From: Corrin L. <Cor...@da...> - 2007-03-20 22:53:21
|
Just a brief *wow*... I tried with the old method but with the keywords sorted by frequency so RD came first, then ST, etc. It made about a 10% improvement in speed. Then I tried using the Regex method and it made a 200% improvement in speed! Pretty amazing to see :) Corrin -----Original Message----- From: Ralph Corderoy [mailto:ra...@in...]=20 Sent: Wednesday, March 21, 2007 1:23 AM To: Corrin Lakeland Cc: pyp...@li... Subject: Re: [Pyparsing] Efficency of Keyword (and a couple other bits)=20 Paul wrote: > We need the Regex to treat these as keywords, so we will surround the=20 > alternatives with the re "word break" indicator "\b". We don't want=20 > this to be used just for the first and last alternatives, so we'll=20 > enclose the alternatives in non-grouping parens, (?:...). This gives=20 > us a re string of: >=20 > r"\b(?:%s)\b" % "|".join( x[0] for x in cursor.fetchall() ) I don't know if the re module optimises r'\b(?:apple|banana|cherry)\b' where each of the alternatives is a literal. If not, then there's gains to be made by concocting a better regexp by noting that the matching tries each alternative in turn, failing when on the first character that doesn't match the current alternative. If you've 500 alternatives this is on average going to test 250 alternatives assuming they occur with equal frequency. |
From: Corrin L. <Cor...@da...> - 2007-03-20 20:26:08
|
Here you go: www.nzpost.co.nz/NZPost/Images/addressing.nzpost/pdfs/AddressStandards.p df It is very long and tiresome, though you can probably get away with just Chapter 4 and Appendix A. Since I'm only doing validation I don't have to worry about any imperfect input which helps simplify things a lot. 1) The unit line is a mess: UNIT =3D Optional((UNIT_TYPE + UNIT_IDENTIFIER)) + Optional(FLOOR) + Optional(BUILDING_NAME) That's made trickier since if none of the elements is present then the whole line is skipped and BUILDING_NAME is defined pretty much as .* Also, building name may go on a separate line, as in ADDRESS =3D REST | (UNIT + SEPERATOR + REST) |=20 | ((Optional(UNIT_TYPE + UNIT_IDENTIFIER) + Optional(FLOOR)) + SEPERATOR + BUILDING_NAME + SEPERATOR + REST 2) The street name really complicates procesing a street line: It starts with an optional UNIT_IDENTIFIER followed by a slash (2/22 Foo St means FLAT 2, 22 Foo St). =20 A few streets don't have a street suffix and annoyingly often have a street suffix in the name (The Terrace is the most well known). =20 A few streets have a street direction at the end of the street name (e.g. North, Upper, Extension). Fortunately, street suffix and street direction are disjoint. So, if I was using a hypothetical perfect parser generator, I could write it like (skipping setResultsName): UNIT_IDENTIFER =3D Word(alphanums) STREET_NUMBER =3D Word(nums) STREET_ALPHA =3D alphas STREET_NAME =3D OneOrMore(Word(alphas)) LONG_SUFFIX =3D "STREET" | "ROAD" | "DRIVE" | ... SHORT_SUFFIX =3D "ST" | "RD" | "DR" | ... STREET_SUFFIX =3D LONG_SUFFIX | SHORT_SUFFIX STREET_DIRECTION =3D "NORTH" | "N" | "EAST" | "E" | "EXTENSION" | "EXT" = | "WEST" | "W"=20 STREET_LEFTPART =3D Optional(UNIT_IDENTIFIER + "/") + STREET_NUMBER + Optional(STREET_ALPHA) STREET_NORMAL =3D STREET_LEFTPART + STREET_NAME + = Optional(STREET_SUFFIX)=20 HIGHWAY_NO =3D Word(alphanums) STREET_SH =3D STREET_LEFTPART + ("SH"|"STATE HIGHWAY") + HIGHWAY_NO + Optional("SH"|"STATE HIGHWAY") STREET =3D STREET_NORMAL | STREET_SH Apart from the crazy cases of "THE TERRACE" which I handle by a whole separate rule, the interesting part here is that ambiguity is best resolved right to left. Looking leftmost an address could start with a number but it means either a street number or a unit number - we don't know until we look for the slash. For the street name we don't know for sure the street name has ended until we see the end of field. At that point we can consider if the thing before the end of field as a street direction, or a street suffix, or part of the street name. The right to left thing also applies at a global scale. Seeing "SUITE" at the start of an address could be the beginning of the unit or of a building for a rural address, or of the unit or a building for an urban address! It isn't until we get to the suburb that we know if we're processing an urban or a rural address (as rural addresses have a suburb of RD <number>). However looking rightmost we can only see a postcode or a country. I considered reversing the entire input and parsing the whole lot backwards, but that felt inelegant. However, your prefix tree suggestion has given me an idea, I'm going to calculate the frequency of each different street suffix and direction, and add that information to the list of each suffix, using 'order by' in the select statement. -----Original Message----- From: Ralph Corderoy [mailto:ra...@in...]=20 Sent: Wednesday, March 21, 2007 2:23 AM To: Corrin Lakeland Cc: pyp...@li... Subject: Re: [Pyparsing] Efficency of Keyword (and a couple other bits)=20 Have you a link to the NZ address format? Cheers, Ralph. |
From: Eike W. <eik...@gm...> - 2007-03-20 16:39:57
|
Hello Paul! On Tuesday 20 March 2007 05:14, Paul McGuire wrote: > For your second question, how to get street names to not read past > the end of the street name and consume the street suffix too? > =A0Again, this is really a common issue in pyparsing grammars - there > is a canned solution, although this may cost us some parse-time > performance. You should turn that part into a FAQ: 'My Parser is Too Greedy'.=20 The first part of your very nicely worked out email could maybe get=20 into the 'Tips' section of Pyparsing's web presence: 'How to Match=20 5000 Keywords'. Slightly OT: The very helpfull 'operatorGrammar' is not mentioned in the 'usage=20 notes' (HowToUsePyparsing) page. I think you should copy the=20 informative passage from the 'news' page to the 'usage notes'. Kind regards, Eike. |
From: Ralph C. <ra...@in...> - 2007-03-20 13:23:08
|
Hi Corrine, Paul wrote: > We need the Regex to treat these as keywords, so we will surround the > alternatives with the re "word break" indicator "\b". We don't want > this to be used just for the first and last alternatives, so we'll > enclose the alternatives in non-grouping parens, (?:...). This gives > us a re string of: > > r"\b(?:%s)\b" % "|".join( x[0] for x in cursor.fetchall() ) I don't know if the re module optimises r'\b(?:apple|banana|cherry)\b' where each of the alternatives is a literal. If not, then there's gains to be made by concocting a better regexp by noting that the matching tries each alternative in turn, failing when on the first character that doesn't match the current alternative. If you've 500 alternatives this is on average going to test 250 alternatives assuming they occur with equal frequency. However, say the alternatives are balloon beech apply chair apple about boat beach big We sort these to see their pattern better. about apple apply balloon beach beech big boat chair We can make the re engine see if the first character is an r'a' and if not it doesn't need to attempt any of the three words starting with r'a'. Similarly with r'b' and r'c'. pat = re.compile(r'''\b(?: a(?:bout|pple|pply)| b(?:alloon|each|eech|ig|oat)| c(?:hair) )\b''', re.VERBOSE) For simplicity in constructing the regexp you may want to use a positive lookahead instead to avoid choppimng the first character from every word. pat = re.compile(r'''\b(?: (?=a)(?:about|apple|apply)| (?=b)(?:balloon|beach|beech|big|boat)| (?=c)(?:chair) )\b''', re.VERBOSE) If all the words occur with equal frequency then odds are that the word will start with r'b' so we should place that branch first. Then the r'a' branch. Then r'c'. pat = re.compile(r'''\b(?: b(?:alloon|each|eech|ig|oat)| a(?:bout|pple|pply)| c(?:hair) )\b''', re.VERBOSE) It could be that you've real data to sample and know that r'chair' occurs 70% of the time so you'd place it first, despite it being a one-word branch. If you've 50,000 words instead of 500 then you can take this approach for the first two characters, e.g. pat = re.compile(r'''\b(?: a(?:b(?:out)|p(?:ple|ply))| b(?:a(?:lloon)|e(?:ach|ech)|i(?:g)|o(?:at))| c(?:h(?:air)) )\b''', re.VERBOSE) What we're doing is giving the re engine a prefix tree. http://en.wikipedia.org/wiki/Prefix_tree Be careful to cope with one and two letter words properly and don't optimise prematurely, but I'm assuming you're going to run this on hoards of data. Even so, it's worth using the timeit module to check you assumptions on real data; who knows, perhaps the re module does optimise this case these days! Have you a link to the NZ address format? Cheers, Ralph. |
From: Paul M. <pa...@al...> - 2007-03-20 04:14:20
|
Corrin - Address parsing is a tricky topic, and many mailing list companies spend a lot of money developing proprietary solutions. It is helpful that New Zealand has specified a standard format, let's see if we can get pyparsing to suss it out. For your first question, here is a slightly cleaned-up version of your street suffix generator (using the results from the db select to give us the various possible street suffixes): cursor.execute(r'select distinct * from (select short_suffix from suffix_to_long UNION select long_suffix from suffix_to_long) as f') STREET_SUFFIX = MatchFirst( [ Keyword(x[0]) for x in cursor.fetchall() ] ).setResultsName("Street Suffix") What's happening here is that, instead of using the '|' operators, we are directly constructing a MatchFirst expression. Realize that expr1 | expr2 is just a short-cut for MatchFirst( [ expr1, expr2 ] ), so all we need to do is build a list of all the Keyword expressions, and make a MatchFirst out of them. This cleans up the eval and "|".join ugliness, but I don't think this will help your speed issue very much. Instead, here is an approach that mimics some of the internals of oneOf, by generating a Regex for us. It's actually similar to your eval approach, but will generate a Regex string instead. In this case, we want all of your alternatives in a Regex, as A|B|C|D|..., so this will look fairly familiar to you: "|".join( x[0] for x in cursor.fetchall() ) We need the Regex to treat these as keywords, so we will surround the alternatives with the re "word break" indicator "\b". We don't want this to be used just for the first and last alternatives, so we'll enclose the alternatives in non-grouping parens, (?:...). This gives us a re string of: r"\b(?:%s)\b" % "|".join( x[0] for x in cursor.fetchall() ) Now pass this as the initializer argument to create a pyparsing Regex expression, and you should get the benefits of oneOf speed and Keyword matching. That is: STREET_SUFFIX = Regex( r"\b(?:%s)\b" % "|".join( x[0] for x in cursor.fetchall() ) ) For your second question, how to get street names to not read past the end of the street name and consume the street suffix too? Again, this is really a common issue in pyparsing grammars - there is a canned solution, although this may cost us some parse-time performance. The problem is that pyparsing does not do overall pattern matching and backtracking the way a regular expression does - instead it marches through the input string left-to-right, successively matching sequential expressions, testing alternatives and repetition, throwing exceptions when mismatches occur, etc. In the following example address: 1234 FLOWER COVERED BRIDGE LANE you want an expression for the street name that takes "FLOWER COVERED BRIDGE", and leaves "LANE" to be the street suffix. The logic in doing this left-to-right is "take each alphabetic word, as long as it is not a valid suffix, and accumulate it into the street name". In pyparsing, this will look like: STREET_NAME = OneOrMore(~STREET_SUFFIX + Word(alphas)).setResultsName("Street Name") OneOrMore takes care of the repetition, but we want it to stop when it reaches a STREET_SUFFIX. I'm not really sure how to make this any more efficient. One other note: this construct will return the example as a list: [ 'FLOWER', 'COVERED', 'BRIDGE' ]. You can merge these for yourself by adding a parse action: STREET_NAME.setParseAction( lambda toks : " ".join(toks) ) or use a Combine wrapper: STREET_NAME = Combine( OneOrMore(~STREET_SUFFIX + Word(alphas)), joinString=' ', adjacent=False ).setResultsName("Street Name") whichever suits your eye better - they are essentially equivalent. (I'd probably take the parse action...) Another note: this will break down with any pathologically named streets, such as LANE LANE or STREET STREET. This sounds ridiculous, but here is a true story: my freshman year in college, I lived in a dormitory donated by an alumnus named Hall - yep, it was named "Hall Hall". Yet another note: it appears that the NZ Post requires addresses to be all uppercase, you might change usage of alphas to your own variable uppers = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. This will speed up slightly some of the internal regex's. Lastly your question regarding building names. I'm not exactly clear from your description how this needs to work, but since you are only testing for success/failure, and you want to accept things that are NOT matches of unit or floor, it seems that you might have some luck with something like: BUILDING_NAME = ~( VALID_UNIT | VALID_FLOOR ) Some time in the past, I worked on a similar address parser, I think it was in response to a c.l.py posting. I'll add it to the examples page on the pyparsing wiki so you can compare it with your own efforts. There are some odd cases, such as street numbers with 1/2 in them, that might be interesting for you to incorporate into your project. HTH, -- Paul |
From: Corrin L. <Cor...@da...> - 2007-03-19 22:55:18
|
Hi Eike, Yes, I'm using Keyword("STREET")|Keyword("ROAD"). The SQL pulls a list of street types from the database, the Keyword(" + x + ") turns the list into keywords, and the eval creates a pyparsing rule for it. Not the easiest code to read by a long way :( The problem is that five hundred strings seperated by | is much less efficient than Word(alphas) -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Eike Welk Sent: Tuesday, March 20, 2007 11:45 AM To: pyp...@li... Subject: Re: [Pyparsing] Efficency of Keyword (and a couple other bits) Hello Corrin! On Monday 19 March 2007 21:06, Corrin Lakeland wrote: > So, any ideas or suggestions welcomed, especially with respect to the=20 > Keyword issue. There is the 'Keyword' parser, it does probably what you want. Usage: mathFuncs =3D Keyword('sin') | Keyword('cos') | Keyword('tan') I use code similar to this in my toy language. Regards Eike. ------------------------------------------------------------------------ - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDE V _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Eike W. <eik...@gm...> - 2007-03-19 22:44:53
|
Hello Corrin! On Monday 19 March 2007 21:06, Corrin Lakeland wrote: > So, any ideas or suggestions welcomed, especially with respect to > the Keyword issue. There is the 'Keyword' parser, it does probably what you want. Usage: mathFuncs = Keyword('sin') | Keyword('cos') | Keyword('tan') I use code similar to this in my toy language. Regards Eike. |
From: Corrin L. <Cor...@da...> - 2007-03-19 20:07:10
|
Hi, =20 I have pyparsing working fairly well, but it is going extremely slowly so I'd like to know how to make it faster. (Current performance is roughly one line per second). =20 Problem: I'm testing addresses to ensure they are correctly formatted. New Zealand Post has recently moved to computerised delivery. That means all business mail must be perfectly formed according to a grammar or else they don't get delivered. I implemented the grammer in pyparsing initially using rules like: =20 STREET_DIRECTION =3D OneOf("NORTH","EAST","WEST","SOUTH","N","E","W","S").setResultsName("Str eet Direction") STREET_SUFFIX =3D OneOf("STREET","ROAD","LANE",...,"ST","RD","LN",...).setResultsName("Str eet Suffix") STREET_NUMBER =3D Word(nums).setResultsName("Street Number") STREET_ALPHA =3D Optional(Word(alphas,exact=3D1)).setResultsName("Street Alpha") STREET_NAME =3D OneOrMore(Word(alphas)).setResultsName("Street Name") STREET =3D STREET_NUMBER + STREET_ALPHA + STREET_NAME + STREET_SUFFIX = + STREET_DIRECTION + FollowedBy(FieldBreak) =20 However, I ran into a few problems. Firstly, the use of OneOf(Literal) meant that ST or RD appearing anywhere in an address matched a suffix even if it is part of a larger word. I solved that problem by replacing OneOf(Literal) with Keywords seperated by bars, as in: =20 cursor.execute(r'select distinct * from (select short_suffix from suffix_to_long UNION select long_suffix from suffix_to_long) as f') valid_suffix_str =3D "|".join([ "Keyword(\"" + x[0] + "\")" for = x in cursor.fetchall() ]) STREET_SUFFIX =3D eval(valid_suffix_str).setResultsName("Street Suffix") =20 Disgusting huh? But I couldn't find anything else that worked accurately :-(. So I guess my first question is, is there any better way of doing this? Or of speeding this up (because in comparison to OneOf, it is _really_ slow, even with enablePackrat.) =20 The second problem I ran into was the parser was too greedy. STREET_NAME did its best to suck up the STREET_SUFFIX without passing it over. I got around that by replacing the definition of STREET_NAME by a SkipTo(STREET_SUFFIX) but it is still more greedy than necessary. I have these nice clear FieldBreaks that split up the address, but my pyparsing grammer does not take advantage of them for increasing efficency. E.g. there is no point looking for a street name that spans them. I just couldn't find any efficient way of forcing this STREET line to be locked just to one field. =20 I also ran into an intermittant problem in pyparsing's backtracking, where the city would be parsed as a suburb successfully but the whole address would be rejected as the 'suburb' was not followed by a postcode. Pyparsing would correctly backtrack and find the correct parse, but the setResultsName("Suburb Name") resulted in both the suburb and the city being set to the same thing! (Un)fortunately I have changed the code since and the current version does not exhibit this behaviour. =20 =20 The last problem I ran into is with building names. A building name is defined as any string that is not a valid unit or a valid floor. E.g. "HARBOUR APARTMENTS" is a valid building name, as is "23 THE TERRACE", but "FLOOR 2" or "SUITE A" isn't. The only way I found to implement that was to create two instances of the parser and calling the second instance inside setParseAction but that's really slow too. I guess it's having to create a whole parse results when all I'm interested in is success for failure. So, any ideas or suggestions welcomed, especially with respect to the Keyword issue. Corrin Lakeland |
From: Paul M. <pa...@al...> - 2007-01-27 05:05:56
|
>>>Unfortunately, the only place I've found any docs on it is using pydoc, ... I just caught this other comment of yours, do you not have the htmldoc directory as part of your pyparsing source or doc distribution? I generated this help directory using epydoc, I'd hoped it would be formatted well enough that one such as yourself could find methods such as matchPreviousXXX. Unfortunately, the win32 self-installer does not include anything else but just the basic pyparsing.py source file, so the sample code and docs get left behind. I really need to refocus on this documentation issue, it comes up more and more often. -- Paul |
From: Paul M. <pa...@al...> - 2007-01-27 04:56:52
|
""" Waylan - The problem here is not parse actions, but with the behavior of scanString. It is possible to hit some false positives with scanString, especially with a grammar such as this. scanString works by walking the input string character by character, trying to match the scan expression. In your case, scanString predictably finds the first three foo's, but also finds an unexpected match at the fourth foo. This is because scanString tries each successive character location, that is, after matching the third foo ending at locn 23, scanString does: loc 24: `````foo`` -> no match loc 25: ````foo`` -> no match loc 26: ```foo`` -> no match loc 27: ``foo`` -> match! Notice the different behavior when using parseString with a grammar of OneOrMore(Group(foo)). Now there is no successive matching of each character location in turn - parseString looks for a foo match at only one location, 24. Now it should also be obvious why `bar` was matched successfully. -- Paul """ from pyparsing import * begin = Word("`") end = matchPreviousExpr(begin) foo = begin + Literal('foo') + end a = foo.scanString("``foo`` ```foo``` `foo` `````foo``") for i in a: print i # this only recognizes the first three foo's ("All the Foos down in Fooville...") print OneOrMore(Group(foo)).parseString("``foo`` ```foo``` `foo` `````foo``") -----Original Message----- From: Waylan Limberg [mailto:wa...@gm...] Sent: Thursday, January 25, 2007 12:35 PM To: Paul McGuire Subject: Re: [Pyparsing] matching a previous match On 1/25/07, Paul McGuire <pa...@al...> wrote: > Version 1.4.5 includes a couple of new helper methods, > matchPreviousLiteral and matchPreviousExpr, for exactly this > situation. I think there is some sample code in the HTML docs, if > not, write back and I'll post some sample code using these methods. Cool. Exactly what I had in mind. Unfortunately, the only place I've found any docs on it is using pydoc, but that should give me enough to get going. Thanks for the pointer. > > There is a current known bug if you are using parse actions with these > two methods, I've got a fix in the works, just need to push out the > next release if its critical for you. Hmm, I might have run into it. Here is what i have so far: >>> begin = Word("`") >>> end = matchPreviousExpr(begin) >>> foo = begin + Literal('foo') + end >>> a = foo.scanString("``foo`` ```foo``` `foo` `````foo``") print [i >>> for i in a] [((['``', 'foo', '``'], {}), 0, 7), ((['```', 'foo', '```'], {}), 8, 17), ((['`', 'foo', '`'], {}), 18, 23), ((['``', 'foo', '``'], {}), 27, 34)] That last one shouldn't match. Which would explain why the following doesn't work: >>> match = begin + SkipTo(end) + end >>> b = match.scanString("`foo` ``foo`bar`` `` `bar` ``") print [i for i >>> in b] [((['`', 'foo', '`'], {}), 0, 5), ((['`', 'foo', '`'], {}), 7, 12), ((['``', '', '``'], {}), 15, 20), ((['`', 'bar', '`'], {}), 21, 26)] > > -- Paul > > > -----Original Message----- > From: pyp...@li... > [mailto:pyp...@li...] On Behalf Of > Waylan Limberg > Sent: Thursday, January 25, 2007 11:00 AM > To: pyp...@li... > Subject: [Pyparsing] matching a previous match > > I'm trying to match an string enclosed in backticks. Rather than using > an escape character, the string is simply wrapped in more backticks. > Here are some examples: > > `foo` => foo > ``foo`bar`` => foo`bar > `` `bar` `` => `bar` #note the spaces in this one > > This is easy in regex: > > (?P<backtick>`+)(?P<string>.*?)(?P=backtick) > > Of course, the trick is that the ending string of backticks must > exactly match the opening string of backticks. > > Sure I can use Regex() for this, but I'm trying to figure you how to > do that without regex. How can I refer back to a previous match? > > -- > ---- > Waylan Limberg > wa...@gm... > > ---------------------------------------------------------------------- > --- Take Surveys. Earn Cash. Influence the Future of IT Join > SourceForge.net's Techsay panel and you'll get the chance to share > your opinions on IT & business topics through brief surveys - and earn > cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEV > DEV _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > > -- ---- Waylan Limberg wa...@gm... |
From: Paul M. <pa...@al...> - 2007-01-25 17:08:10
|
Version 1.4.5 includes a couple of new helper methods, matchPreviousLiteral and matchPreviousExpr, for exactly this situation. I think there is some sample code in the HTML docs, if not, write back and I'll post some sample code using these methods. There is a current known bug if you are using parse actions with these two methods, I've got a fix in the works, just need to push out the next release if its critical for you. -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Waylan Limberg Sent: Thursday, January 25, 2007 11:00 AM To: pyp...@li... Subject: [Pyparsing] matching a previous match I'm trying to match an string enclosed in backticks. Rather than using an escape character, the string is simply wrapped in more backticks. Here are some examples: `foo` => foo ``foo`bar`` => foo`bar `` `bar` `` => `bar` #note the spaces in this one This is easy in regex: (?P<backtick>`+)(?P<string>.*?)(?P=backtick) Of course, the trick is that the ending string of backticks must exactly match the opening string of backticks. Sure I can use Regex() for this, but I'm trying to figure you how to do that without regex. How can I refer back to a previous match? -- ---- Waylan Limberg wa...@gm... ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Waylan L. <wa...@gm...> - 2007-01-25 16:59:48
|
I'm trying to match an string enclosed in backticks. Rather than using an escape character, the string is simply wrapped in more backticks. Here are some examples: `foo` => foo ``foo`bar`` => foo`bar `` `bar` `` => `bar` #note the spaces in this one This is easy in regex: (?P<backtick>`+)(?P<string>.*?)(?P=backtick) Of course, the trick is that the ending string of backticks must exactly match the opening string of backticks. Sure I can use Regex() for this, but I'm trying to figure you how to do that without regex. How can I refer back to a previous match? -- ---- Waylan Limberg wa...@gm... |
From: Poromenos <por...@po...> - 2006-12-10 12:06:27
|
That works, thanks a lot! :/ I was doing task = Words(alphas + " ") expression = task + SkipTo("on").setResultsName("taskDesc") + \ "on" + \ date.setResultsName("taskDate") and it wasn't working. Now it works like a charm, although I have run into some weird bugs which I have reported. Thanks again. On 12/10/06, Paul McGuire <pa...@al...> wrote: > SkipTo, perhaps? > > expression = SkipTo("on").setResultsName("taskDesc") + \ > "on" + \ > date.setResultsName("taskDate") > > > -- Paul > > > > -----Original Message----- > > From: pyp...@li... [mailto:pyparsing- > > use...@li...] On Behalf Of Poromenos > > Sent: Saturday, December 09, 2006 6:12 PM > > To: pyp...@li... > > Subject: [Pyparsing] Matching everything up to a point. > > > > Hello all, > > I have spent the better part of today writing a parser to parse > > natural language sentences, specifically dates. What I want to do is > > something like: > > > > Doctor's appointment on November 20 > > > > I have completed the parsing of the date part (everything from "on" > > until the end), and I went on to add the term to catch everything > > before the on like so: > > > > expression = task + date > > > > only to find to my chagrin that it cannot match. It is raising an > > exception that it expects an "on" at the end of the line. Can anyone > > tell me how to match everything up until a recognisable (to the > > parser) date? > > > > Thanks > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > > your > > opinions on IT & business topics through brief surveys - and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > > Pyparsing-users mailing list > > Pyp...@li... > > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > > |
From: Paul M. <pa...@al...> - 2006-12-10 12:03:21
|
SkipTo, perhaps? expression = SkipTo("on").setResultsName("taskDesc") + \ "on" + \ date.setResultsName("taskDate") -- Paul > -----Original Message----- > From: pyp...@li... [mailto:pyparsing- > use...@li...] On Behalf Of Poromenos > Sent: Saturday, December 09, 2006 6:12 PM > To: pyp...@li... > Subject: [Pyparsing] Matching everything up to a point. > > Hello all, > I have spent the better part of today writing a parser to parse > natural language sentences, specifically dates. What I want to do is > something like: > > Doctor's appointment on November 20 > > I have completed the parsing of the date part (everything from "on" > until the end), and I went on to add the term to catch everything > before the on like so: > > expression = task + date > > only to find to my chagrin that it cannot match. It is raising an > exception that it expects an "on" at the end of the line. Can anyone > tell me how to match everything up until a recognisable (to the > parser) date? > > Thanks > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Poromenos <por...@po...> - 2006-12-10 00:12:11
|
Hello all, I have spent the better part of today writing a parser to parse natural language sentences, specifically dates. What I want to do is something like: Doctor's appointment on November 20 I have completed the parsing of the date part (everything from "on" until the end), and I went on to add the term to catch everything before the on like so: expression = task + date only to find to my chagrin that it cannot match. It is raising an exception that it expects an "on" at the end of the line. Can anyone tell me how to match everything up until a recognisable (to the parser) date? Thanks |
From: George P. <ge...@ga...> - 2006-10-25 01:05:46
|
> George - > > Is it really necessary to create the dict directly in the ParseResults? > This could probably be done in some tricky parse actions, but things might > be simpler if your ParseResults just reflect the expression structure, and > then walk the structure to build up your nested dict. Below is one way. > > -- Paul > > > from pyparsing import * > > s = \ > """ > a.clk = "----____----____" > a.rst = "--------________" > > b.clk = "____----____----" > b.x = "________--------" > """ > > TIMING_SPEC = Combine('"' + Word('_-') + '"' ) > ident = Word(alphas, alphanums) > > SIGNAL_DEF = ident.setResultsName("GROUP") + \ > '.' + ident.setResultsName("SIGNAME") + \ > '=' + \ > TIMING_SPEC.setResultsName("TIMING_SPEC") > > GRAMMAR = OneOrMore( Group(SIGNAL_DEF) ) > > results = GRAMMAR.parseString(s) > > d = {} > for toks in results: > if toks.GROUP not in d: > d[toks.GROUP] = {} > d[toks.GROUP][toks.SIGNAME] = toks.TIMING_SPEC > > print d > print d.keys() > for k in d.keys(): > print k, d[k].keys() > > > Prints out: > {'a': {'rst': '"--------________"', 'clk': '"----____----____"'}, 'b': {'x': > '"________--------"', 'clk': '"____----____----"'}} > ['a', 'b'] > a ['rst', 'clk'] > b ['x', 'clk'] > d['a']['rst'] -> "--------________" > d['a']['clk'] -> "----____----____" > d['b']['x'] -> "________--------" > d['b']['clk'] -> "____----____----" > > Paul, Thanks for your fantastic help! Your detailed example taught me a lot. No, I didn't need to create the dict in the parse results directly. I'm glad you questioned that because I was starting to go insane ;-) Here's an excerpt of the solution you helped guide me to. (Note: I accidentally used quotes in my timing spec string in the previous email). If you're curious what I'm doing this for, I'm making a tool for MyHDL to increase the fun factor in the design and testing of complex electronic systems: http://myhdl.jandecaluwe.com/doku.php/users:george_pantazopoulos Kudos, George http://www.gammaburst.net # --------------------- from pyparsing import * def parse_ats_block(ats_block): """ Example input: a.clk = ----____----____ a.rst = --------________ b.clk = ____----____---- b.x = ________-------- For the Timing Diagram portion: Input Parser output --------------------------------------- "--__" => ['-', '-', '_', '_'] "1..72." => ['1..', '72.'] """ # pyparsing grammar # Special thanks to Paul McGuire # The OneOrMore() will split up the timing spec into a list of tokens TIMING_SPEC = OneOrMore(Regex('[-_]') | Regex('[0-9]+\.+')) identifier = Word(alphas, alphanums) SIGNAL_DEF = identifier.setResultsName("GROUP") + \ '.' + identifier.setResultsName("SIGNAME") + \ '=' + TIMING_SPEC.setResultsName("TIMING_SPEC") ATS_GRAMMAR = OneOrMore(Group(SIGNAL_DEF)) # The parse results can be treated as a list of lists of the following # form: (Each sublist represents a signal definition) # [groupname, '.', signame, '=', timingspec] # In addition, thanks to the use of setResultsName() # the components can also be accessed using attribute syntax # Each signal definition entry in the results will contain # a .GROUP, .SIGNAME, and .TIMING_SPEC attribute. # Parse results = ATS_GRAMMAR.parseString(ats_block) # From the parse results create a nested dict. # The top level dict contains a dict for each group, keyed by the # group name. # # Each group dict contains key/value pairs where key = signal name # and value = the signal's timing spec d = dict() for toks in results: if toks.GROUP not in d: d[toks.GROUP] = dict() d[toks.GROUP][toks.SIGNAME] = toks.TIMING_SPEC results_dict = d return results_dict |
From: Paul M. <pa...@al...> - 2006-10-24 14:00:33
|
> -----Original Message----- > From: pyp...@li... > [mailto:pyp...@li...] On > Behalf Of George Pantazopoulos > Sent: Monday, October 23, 2006 11:31 PM > To: pyp...@li... > Subject: [Pyparsing] Parsing multiple items into the same Dict > > > Hi all, > Given the following: > > s = \ > """ > a.clk = "----____----____" > a.rst = "--------________" > > b.clk = "____----____----" > b.x = "________--------" > """ > > TIMING_SPEC = OneOrMore(Regex('[-_]')) > GROUPNAME = Word(alphanums) + Suppress('.') > SIGNAME = Word(alphanums) + Suppress('=') > > GRAMMAR = ??? > > # -------- > > What would the grammar have to look like so I can get the > following kind of parse results? > George - Is it really necessary to create the dict directly in the ParseResults? This could probably be done in some tricky parse actions, but things might be simpler if your ParseResults just reflect the expression structure, and then walk the structure to build up your nested dict. Below is one way. -- Paul from pyparsing import * s = \ """ a.clk = "----____----____" a.rst = "--------________" b.clk = "____----____----" b.x = "________--------" """ TIMING_SPEC = Combine('"' + Word('_-') + '"' ) ident = Word(alphas, alphanums) SIGNAL_DEF = ident.setResultsName("GROUP") + \ '.' + ident.setResultsName("SIGNAME") + \ '=' + \ TIMING_SPEC.setResultsName("TIMING_SPEC") GRAMMAR = OneOrMore( Group(SIGNAL_DEF) ) results = GRAMMAR.parseString(s) d = {} for toks in results: if toks.GROUP not in d: d[toks.GROUP] = {} d[toks.GROUP][toks.SIGNAME] = toks.TIMING_SPEC print d print d.keys() for k in d.keys(): print k, d[k].keys() Prints out: {'a': {'rst': '"--------________"', 'clk': '"----____----____"'}, 'b': {'x': '"________--------"', 'clk': '"____----____----"'}} ['a', 'b'] a ['rst', 'clk'] b ['x', 'clk'] d['a']['rst'] -> "--------________" d['a']['clk'] -> "----____----____" d['b']['x'] -> "________--------" d['b']['clk'] -> "____----____----" -- Paul |
From: George P. <ge...@ga...> - 2006-10-24 04:31:06
|
Hi all, Given the following: s = \ """ a.clk = "----____----____" a.rst = "--------________" b.clk = "____----____----" b.x = "________--------" """ TIMING_SPEC = OneOrMore(Regex('[-_]')) GROUPNAME = Word(alphanums) + Suppress('.') SIGNAME = Word(alphanums) + Suppress('=') GRAMMAR = ??? # -------- What would the grammar have to look like so I can get the following kind of parse results? In particular, I'm stuck on how to parse multiple items into the same dict. In this case, I want to have a dict containing GROUPNAMEs as keys. Each GROUPNAME dict contains one or more 'SIGNAME':TIMINGSPEC key/value pairs: >>>d = GRAMMAR.parseString(s).asDict() >>>d['a'].keys() ['clk', 'rst'] >>>d['b'].keys() ['clk', 'x'] >>>d['a']['clk'] ----____----____ >>>d['a']['rst'] --------________ >>>d['b']['clk'] ____----____---- >>>d['b']['x'] ________-------- Thanks, George |