pyparsing-users Mailing List for Python parsing module (Page 24)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Andrew S. <agt...@ya...> - 2008-01-15 09:47:12
|
Paul, Thanks for the quick reply. So far using pe.loc < len(input) works for me. I'll reply to the list if I can find a counter example. Can I count on loc sticking around in the ParseException class? Not to push my luck, but I've got a grammar question too. I'm trying to define a grammar that uses a skipTo(Optional(xxx)) and not having much success. Is there a better way to go about this? An example select where foo = 1 and bar = 2 into result I was hoping to end up with something like the following: into_clause = Keyword('into') + restOfLine Keyword('select') + skipTo(Optional(into_clause)).setResultsName('where_condition') Is there a way to do this without forcing some delimiters onto the where clause? Thanks again for such a useful library. -a. --- Paul McGuire <pt...@au...> wrote: > Andrew - > > You may be able to infer something like this based > on the location of the > raised exception. For instance, consider this > grammar: > > grmr = Literal("A") + ( Literal("B") + Literal("C") > | Literal("D") + > Literal("E") ) > > A full match would require one of these input > strings (with whitespace > allowed, of course): > > ABC > ADE > > So any of these partial strings would indicate that > there could still be a > match: > > A > AB > AD > > but they would raise a ParseException since they are > not complete. The > thing to note is, that the loc field of the raised > exception is equal to the > length of the input string, telling you that the > missing piece would be > found at the end of what was given, but that the > input so far did match the > grammar. > > By contrast, these strings are not partially valid: > > B > AC > ABX > > In these cases, the loc field of the raised > exception is less than the > length of the input string, telling you that the > provided string is not a > partial match. > > This is a very simplistic example, you should test > this idea a bit more > thoroughly with your own specific grammar. I > haven't thought it through > myself for more than about 10 minutes, so please let > me know how it works > out. > > -- Paul > > > > ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs |
From: Paul M. <pt...@au...> - 2008-01-14 08:07:21
|
Andrew - You may be able to infer something like this based on the location of the raised exception. For instance, consider this grammar: grmr = Literal("A") + ( Literal("B") + Literal("C") | Literal("D") + Literal("E") ) A full match would require one of these input strings (with whitespace allowed, of course): ABC ADE So any of these partial strings would indicate that there could still be a match: A AB AD but they would raise a ParseException since they are not complete. The thing to note is, that the loc field of the raised exception is equal to the length of the input string, telling you that the missing piece would be found at the end of what was given, but that the input so far did match the grammar. By contrast, these strings are not partially valid: B AC ABX In these cases, the loc field of the raised exception is less than the length of the input string, telling you that the provided string is not a partial match. This is a very simplistic example, you should test this idea a bit more thoroughly with your own specific grammar. I haven't thought it through myself for more than about 10 minutes, so please let me know how it works out. -- Paul |
From: Andrew S. <agt...@ya...> - 2008-01-14 06:32:23
|
I'd like to have my application accept incremental input until a statement conforming to my grammar has been entered. My current approach is simplistic but works well enough. I create a small "Interpreter" to buffer input until it's ready. Something like this: class Interpreter(object): def __init__(self, grammar): self.buffer = [] self.grammar = grammar def push(self, line): result = None self.buffer.append(line) try: result = self.grammar.parse(self.buffer) except ParseException, pe: pass if result: self.buffer = [] return result Now I'd like to add the following feature to my Interpreter. If the buffer has enough input to determine that the contents will _never_ be valid I'd like to re-raise the ParseException so the clients of the Interpreter can stop accepting input. Is there a way to determine if a "fragment" has the potential to be parsed by my grammar? I thought about comparing positions of the parse element in the exception with the elements in the grammar but wanted to check with the list to find out if there is an accepted way of doing this. -a. ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs |
From: david <asd...@gn...> - 2007-12-13 01:10:36
|
On Mon, 10 Dec 2007 10:53:13 -0600, Paul McGuire wrote: > David - > > Here is a modified version of your program, that runs for me in 0.05 > seconds or so. Paul, Thank you very much, your explanation is excellent, and the new grammar works like a charms :) > The main problem was your universal use of Or instead of MatchFirst. > Using Or in the definition of OPERATOR is not so big a problem, but you > can convert to MatchFirst if you are careful about the ordering of your > operators. And the operators may as well be Literals, and not Keywords, > since Keyword only looks for boundary alphanumeric characters, not > symbols or whitespace. I understand the performance implications between Or and MatchFirst, but I still don't understand why Literals are faster than of Keywords. > Is there any reason you choose the > > OPERATOR = psg.Or([EQUAL, NOT_EQUAL, CONTAINS, NOT_CONTAINS, STARTWITH]) > > style, instead of using the built-in operators: > > OPERATOR = EQUAL ^ NOT_EQUAL ^ CONTAINS ^ NOT_CONTAINS ^ STARTWITH > > or even better: > > OPERATOR = STARTWITH | EQUAL | NOT_EQUAL | CONTAINS | NOT_CONTAINS Writing grammars is not my business, thanks to pyparsing's simplicity I wrote my first grammar (the one your tune) months ago in one day; I tend to prefer a more readable syntax for non repetitive tasks. Thank you david |
From: Kees B. <kee...@al...> - 2007-12-12 15:25:16
|
On Wednesday 12 December 2007 15:55, Paul McGuire wrote: > -----Original Message----- > From: pyp...@li... > [mailto:pyp...@li...] On Behalf Of Kees > Bakker > Sent: Wednesday, December 12, 2007 7:46 AM > To: pyp...@li... > Subject: [Pyparsing] Newby question about ignore > > In the example goal1.ignore I want to ignore lines with just equal-sign. The > result is: > Expected W:(abcd...) (at char 0), (line:1, col:1) > > Another attempt is goal2.ignore to ignore lines that start with '=' and then > ignore the rest. The result is: > Expected W:(abcd...) (at char 33), (line:2, col:1) > > ---------- > Kees, > > I'll answer the second example first, since this is pretty straightforward. > LineStart is very particular about where the parser happens to be when it > tries to match - it *must* be at the beginning of a line. In your example, > you defined your comment expression as: > > lineStart + Literal('==') + restOfLine > > In your input text, the first line of '=' signs matches the opening > lineStart, the first two '==' signs of the line, and the rest of the line. > But at this point, the parser is left at the end of line 1. So now it can't > match another comment, and it wont match a Word(alphanums), so it raises an > exception. The simplest fix is to expand the comment definition to consume > the line end as well, using: > > lineStart + Literal('==') + restOfLine + lineEnd > > If you change the ignore expression for goal2 to read like this, then goal2 > will successfully ignore the 3 comment lines, and read in the data on line > 4. Hmmm. OK I understand your explanation. However, I didn't expect the parser to not skip whitespace at that point. (You explain more about it further down, thanks.) > > The first problem is a little stickier. In this case, you are trying match > *only* lines made up only of '=' signs. Doing so with OneOrMore('=') > actually does a little more than you wanted. Remember that pyparsing's > default behavior is to skip over whitespace. OneOrMore('=') matches each > '=' separately, and does whitespace skipping between each one. So what you > have written will match not only > ================== > but also > = = = = = = = = = = > AND, since line ends are treated like whitespace, your comment definition's > OneOrMore('=') even reads the first two '=' signs on the next line. Now the > parser is located at the '.' on line 2, which is nowhere near a lineEnd, so > the parser concludes that this is not a comment, goes back to line 1, column > 1, and tries to match a Word(alphanums), which then fails, and raises the > exception that you see. Ah, that's good to know. I was too much focused to the reported position. > > The quick fix for problem 1 is to replace OneOrMore('=') with Word('='). > Word does repetition of a character without skipping whitespace. Ah, that's also good to know. > But this > only gets you past line 1. At this point, you are at the beginning of line > 2, which still does not start with a Word(alphanums). But now you should > get a more reasonable exception, Expected W:(abcd...) (at char 33), (line:2, > col:1). Yes, that's OK. It was just an example :-) > > So why does Literal('=') read past the line end, but line start doesn't? > Because a few pyparsing classes *don't* do whitespace skipping! The ones > that do not skip whitespace are: > - the positional classes (LineStart, LineEnd, StringStart, StringEnd) > - CharsNotIn > - restOfLine > > My personal choice for defining an '='s-based comment in your example would > probably be: > > Literal('==') + restOfLine > > Leave out the lineStart and lineEnd, and just ignore '==' to the end of the > current line. > > But if you want to restrict '==' comments to column 1 only, then you will > need to use the comment expression I gave at the beginning of this e-mail. > > Hope this helps - Welcome to pyparsing! Thanks a lot. (BTW, I bought a copy of the PDF booklet from O'Reilly. That is quite helpful too.) -- Kees |
From: Paul M. <pt...@au...> - 2007-12-12 14:55:38
|
-----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Kees Bakker Sent: Wednesday, December 12, 2007 7:46 AM To: pyp...@li... Subject: [Pyparsing] Newby question about ignore In the example goal1.ignore I want to ignore lines with just equal-sign. The result is: Expected W:(abcd...) (at char 0), (line:1, col:1) Another attempt is goal2.ignore to ignore lines that start with '=' and then ignore the rest. The result is: Expected W:(abcd...) (at char 33), (line:2, col:1) ---------- Kees, I'll answer the second example first, since this is pretty straightforward. LineStart is very particular about where the parser happens to be when it tries to match - it *must* be at the beginning of a line. In your example, you defined your comment expression as: lineStart + Literal('==') + restOfLine In your input text, the first line of '=' signs matches the opening lineStart, the first two '==' signs of the line, and the rest of the line. But at this point, the parser is left at the end of line 1. So now it can't match another comment, and it wont match a Word(alphanums), so it raises an exception. The simplest fix is to expand the comment definition to consume the line end as well, using: lineStart + Literal('==') + restOfLine + lineEnd If you change the ignore expression for goal2 to read like this, then goal2 will successfully ignore the 3 comment lines, and read in the data on line 4. The first problem is a little stickier. In this case, you are trying match *only* lines made up only of '=' signs. Doing so with OneOrMore('=') actually does a little more than you wanted. Remember that pyparsing's default behavior is to skip over whitespace. OneOrMore('=') matches each '=' separately, and does whitespace skipping between each one. So what you have written will match not only ================== but also = = = = = = = = = = AND, since line ends are treated like whitespace, your comment definition's OneOrMore('=') even reads the first two '=' signs on the next line. Now the parser is located at the '.' on line 2, which is nowhere near a lineEnd, so the parser concludes that this is not a comment, goes back to line 1, column 1, and tries to match a Word(alphanums), which then fails, and raises the exception that you see. The quick fix for problem 1 is to replace OneOrMore('=') with Word('='). Word does repetition of a character without skipping whitespace. But this only gets you past line 1. At this point, you are at the beginning of line 2, which still does not start with a Word(alphanums). But now you should get a more reasonable exception, Expected W:(abcd...) (at char 33), (line:2, col:1). So why does Literal('=') read past the line end, but line start doesn't? Because a few pyparsing classes *don't* do whitespace skipping! The ones that do not skip whitespace are: - the positional classes (LineStart, LineEnd, StringStart, StringEnd) - CharsNotIn - restOfLine My personal choice for defining an '='s-based comment in your example would probably be: Literal('==') + restOfLine Leave out the lineStart and lineEnd, and just ignore '==' to the end of the current line. But if you want to restrict '==' comments to column 1 only, then you will need to use the comment expression I gave at the beginning of this e-mail. Hope this helps - Welcome to pyparsing! -- Paul |
From: Kees B. <kee...@al...> - 2007-12-12 13:45:54
|
Hi, Here is a simple example. Maybe I'm doing something silly. There are two attempts to catch and ignore certain lines with '='. In the example goal1.ignore I want to ignore lines with just equal-sign. The result is: Expected W:(abcd...) (at char 0), (line:1, col:1) Another attempt is goal2.ignore to ignore lines that start with '=' and then ignore the rest. The result is: Expected W:(abcd...) (at char 33), (line:2, col:1) Can anyone tell me what's wrong with these two examples? TIA, Kees #! /usr/bin/env python # -*- coding: utf-8 -*- from pyparsing import * goal1 = OneOrMore( Word(alphanums) ) goal1.ignore( OneOrMore( Literal('=') ) + lineEnd ) goal2 = OneOrMore( Word(alphanums) ) goal2.ignore( lineStart + Literal('==') + restOfLine ) dump_txt = '''\ ================================ == .debug_info section index 61 ================================ DWCHECK REMARK bla bla ''' def parse_it(): # Make this goal1 or goal2 result = goal1.parseString(dump_txt) if result: from pprint import pprint pprint( result.asList() ) def main(): parse_it() if __name__ == "__main__": try: main() except SystemExit, e: print e except Exception, e: print e |
From: Paul M. <pt...@au...> - 2007-12-10 16:53:33
|
David - Here is a modified version of your program, that runs for me in 0.05 seconds or so. The main problem was your universal use of Or instead of MatchFirst. Using Or in the definition of OPERATOR is not so big a problem, but you can convert to MatchFirst if you are careful about the ordering of your operators. And the operators may as well be Literals, and not Keywords, since Keyword only looks for boundary alphanumeric characters, not symbols or whitespace. Is there any reason you choose the OPERATOR = psg.Or([EQUAL, NOT_EQUAL, CONTAINS, NOT_CONTAINS, STARTWITH]) style, instead of using the built-in operators: OPERATOR = EQUAL ^ NOT_EQUAL ^ CONTAINS ^ NOT_CONTAINS ^ STARTWITH or even better: OPERATOR = STARTWITH | EQUAL | NOT_EQUAL | CONTAINS | NOT_CONTAINS ?? But the real performance culprit is this: UNABIN_EXPR = psg.Or([ATOM, BIN_EXPR]) Since BIN_EXPR has a recursive component, and since Or has to evaluate both expressions to test for a longest match, things end up taking a Really. Long. Time. Changing this item to a MatchFirst makes things move right along. And there is no chance of misparsing, since an ATOM definition cannot be a leading part of a BIN_EXPR (which is when MatchFirst has issues - note how I reordered the items in the OPERATOR list, so that '~>' would be tested ahead of '~'). Lastly, I added a Group definition to LOGIC_EXPR. This will structure the results in a hierarchy that matches any nesting of and's and or's. pprint will print out the data in this hierarchy. Try it with and without the Group and you can see the difference. Cheers, -- Paul import pyparsing as psg EQUAL = psg.Literal('=') NOT_EQUAL = psg.Literal('!=') CONTAINS = psg.Literal('~') NOT_CONTAINS = psg.Literal('!~') STARTWITH = psg.Literal('~>') #~ OPERATOR = psg.MatchFirst([STARTWITH, EQUAL, NOT_EQUAL, CONTAINS, NOT_CONTAINS]) OPERATOR = STARTWITH | EQUAL | NOT_EQUAL | CONTAINS | NOT_CONTAINS #~ FIELD = psg.Word(psg.alphanums + '/').setParseAction(lambda s, loc, toks: Field(toks[0])) FIELD = psg.Word(psg.alphanums + '/')#.setParseAction(lambda s, loc, toks: Field(toks[0])) VALUE = psg.quotedString.setParseAction(psg.removeQuotes) ATOM = FIELD + OPERATOR + VALUE #~ LOGIC_OP = psg.Or([ #~ psg.Keyword('and').setParseAction(lambda *args: And()), #~ psg.Keyword('or').setParseAction(lambda *args: Or()) #~ ]) #~ LOGIC_OP = psg.Keyword('and').setParseAction(lambda *args: And()) | \ #~ psg.Keyword('or').setParseAction(lambda *args: Or()) LOGIC_OP = psg.Keyword('and') | \ psg.Keyword('or') BIN_EXPR = psg.Forward() #~ UNABIN_EXPR = psg.Or([ATOM, BIN_EXPR]) #~ UNABIN_EXPR = psg.MatchFirst([BIN_EXPR,ATOM]) UNABIN_EXPR = BIN_EXPR | ATOM # this is the culprit - change operator to '^' and this takes forever! BIN_EXPR << psg.Group(LOGIC_OP + '(' + UNABIN_EXPR + ',' + UNABIN_EXPR + ')') #~ GRAMMAR = psg.Or([BIN_EXPR, ATOM]) #~ GRAMMAR = psg.MatchFirst([BIN_EXPR, ATOM]) GRAMMAR = UNABIN_EXPR # I used the Python textwrap module to get this under control! :) data = "or(/BI/BIB/BIBH = '00000509', or(/BI/BIB/BIBH = '00000769'," \ "or(/BI/BIB/BIBH = '00001673', or(/BI/BIB/BIBH = '00000058'," \ "or(/BI/BIB/BIBH = '00000764', or(/BI/BIB/BIBH = '00001238'," \ "or(/BI/BIB/BIBH = '00001592', or (/BI/BIB/BIBH = '00001017'," \ "or(/BI/BIB/BIBH = '00002676', or(/BI/BIB/BIBH = '00001554'," \ "or(/BI/BIB/BIBH = '00002193', or(/BI/BIB/BIBH = '00001907'," \ "or(/BI/BIB/BIBH = '00000366', or(/BI/BIB/BIBH = '00001161'," \ "or(/BI/BIB/BIBH = '00002991', or(/BI/BIB/BIBH = '00002820'," \ "or(/BI/BIB/BIBH = '00000610', or(/BI/BIB/BIBH = '00000521'," \ "or(/BI/BIB/BIBH = '00003143', or (/BI/BIB/BIBH = '00002869'," \ "or(/BI/BIB/BIBH = '00000410', or(/BI/BIB/BIBH = '00001926'," \ "or(/BI/BIB/BIBH = '00000061', or(/BI/BIB/BIBH = '00000165'," \ "or(/BI/BIB/BIBH = '00000669', or(/BI/BIB/BIBH = '00002675'," \ "or(/BI/BIB/BIBH = '00000770', or(/BI/BIB/BIBH = '00000981'," \ "or(/BI/BIB/BIBH = '00001841', or(/BI/BIB/BIBH = '00002668'," \ "or(/BI/BIB/BIBH = '00001949', or (/BI/BIB/BIBH = '00002819'," \ "or(/BI/BIB/BIBH = '00000623', or(/BI/BIB/BIBH = '00002365'," \ "or(/BI/BIB/BIBH = '00000865', or(/BI/BIB/BIBH = '00000176'," \ "or(/BI/BIB/BIBH = '00002036', or(/BI/BIB/BIBH = '00002640'," \ "or(/BI/BIB/BIBH = '00001297', or(/BI/BIB/BIBH = '00000811'," \ "or(/BI/BIB/BIBH = '00001594', or(/BI/BIB/BIBH = '00001160'," \ "or(/BI/BIB/BIBH = '00000709', or (/BI/BIB/BIBH = '00001908'," \ "or(/BI/BIB/BIBH = '00002836', or(/BI/BIB/BIBH = '00001283'," \ "or(/BI/BIB/BIBH = '00002946', or(/BI/BIB/BIBH = '00001877'," \ "or(/BI/BIB/BIBH = '00002378', or(/BI/BIB/BIBH = '00000474'," \ "or(/BI/BIB/BIBH = '00002710', or(/BI/BIB/BIBH = '00001884'," \ "or(/BI/BIB/BIBH = '00000968', or(/BI/BIB/BIBH = '00000926'," \ "or(/BI/BIB/BIBH = '00001902', or (/BI/BIB/BIBH = '00001018'," \ "or(/BI/BIB/BIBH = '00002989', or(/BI/BIB/BIBH = '00001590'," \ "or(/BI/BIB/BIBH = '00001300', or(/BI/BIB/BIBH = '00002754'," \ "or(/BI/BIB/BIBH = '00001923', or(/BI/BIB/BIBH = '00002771'," \ "or(/BI/BIB/BIBH = '00000768', or(/BI/BIB/BIBH = '00002034'," \ "or(/BI/BIB/BIBH = '00001851', or(/BI/BIB/BIBH = '00001303'," \ "or(/BI/BIB/BIBH = '00002642', or (/BI/BIB/BIBH = '00002949'," \ "or(/BI/BIB/BIBH = '00000821', or(/BI/BIB/BIBH = '00000989'," \ "or(/BI/BIB/BIBH = '00001377', or(/BI/BIB/BIBH = '00001514'," \ "or(/BI/BIB/BIBH = '00000510', or(/BI/BIB/BIBH = '00000054'," \ "or(/BI/BIB/BIBH = '00001280', or(/BI/BIB/BIBH = '00002370'," \ "or(/BI/BIB/BIBH = '00001924', or(/BI/BIB/BIBH = '00000063'," \ "or(/BI/BIB/BIBH = '00001632', or (/BI/BIB/BIBH = '00000434'," \ "or(/BI/BIB/BIBH = '00003035', or(/BI/BIB/BIBH = '00001657'," \ "or(/BI/BIB/BIBH = '00002053', or(/BI/BIB/BIBH = '00000966'," \ "or(/BI/BIB/BIBH = '00002699', or(/BI/BIB/BIBH = '00003178'," \ "or(/BI/BIB/BIBH = '00002602', or(/BI/BIB/BIBH = '00002194'," \ "or(/BI/BIB/BIBH = '00001591', or(/BI/BIB/BIBH = '00001910'," \ "or(/BI/BIB/BIBH = '00000174', or (/BI/BIB/BIBH = '00002132'," \ "/BI/BIB/BIBH = '00000665')))))))))))))))))))))))))))))))))))))))))))))" \ ")))))))))))))))))))))))))))))))))))))))))))))))" import time start = time.time() tokens = GRAMMAR.parseString(data) print time.time() - start import pprint pprint.pprint(tokens.asList()) |
From: david <asd...@gn...> - 2007-12-10 14:50:18
|
Hi all, I use pyparsing to parse a simple, self-made, query language. Here is the grammar: ----8<------------------------------------------------------------------- import pyparsing as psg EQUAL = psg.Keyword('=') NOT_EQUAL = psg.Keyword('!=') CONTAINS = psg.Keyword('~') NOT_CONTAINS = psg.Keyword('!~') STARTWITH = psg.Keyword('~>') OPERATOR = psg.Or([EQUAL, NOT_EQUAL, CONTAINS, NOT_CONTAINS, STARTWITH]) FIELD = psg.Word(psg.alphanums + '/').setParseAction(lambda s, loc, toks: Field(toks[0])) VALUE = psg.quotedString.setParseAction(psg.removeQuotes) ATOM = FIELD + OPERATOR + VALUE LOGIC_OP = psg.Or([ psg.Keyword('and').setParseAction(lambda *args: And()), psg.Keyword('or').setParseAction(lambda *args: Or()) ]) BIN_EXPR = psg.Forward() UNABIN_EXPR = psg.Or([ATOM, BIN_EXPR]) BIN_EXPR << (LOGIC_OP + '(' + UNABIN_EXPR + ',' + UNABIN_EXPR + ')') GRAMMAR = psg.Or([BIN_EXPR, ATOM]) ---->8------------------------------------------------------------------- This grammar works like a charme for small input string, but hang with CPU 100% for minutes with a large input string. For large input string I mean something like (sorry this will break your newsreader): ----8<------------------------------------------------------------------- data = """or(/BI/BIB/BIBH = '00000509', or(/BI/BIB/BIBH = '00000769', or(/ BI/BIB/BIBH = '00001673', or(/BI/BIB/BIBH = '00000058', or(/BI/BIB/BIBH = '00000764', or(/BI/BIB/BIBH = '00001238', or(/BI/BIB/BIBH = '00001592', or (/BI/BIB/BIBH = '00001017', or(/BI/BIB/BIBH = '00002676', or(/BI/BIB/BIBH = '00001554', or(/BI/BIB/BIBH = '00002193', or(/BI/BIB/BIBH = '00001907', or(/BI/BIB/BIBH = '00000366', or(/BI/BIB/BIBH = '00001161', or(/BI/BIB/ BIBH = '00002991', or(/BI/BIB/BIBH = '00002820', or(/BI/BIB/BIBH = '00000610', or(/BI/BIB/BIBH = '00000521', or(/BI/BIB/BIBH = '00003143', or (/BI/BIB/BIBH = '00002869', or(/BI/BIB/BIBH = '00000410', or(/BI/BIB/BIBH = '00001926', or(/BI/BIB/BIBH = '00000061', or(/BI/BIB/BIBH = '00000165', or(/BI/BIB/BIBH = '00000669', or(/BI/BIB/BIBH = '00002675', or(/BI/BIB/ BIBH = '00000770', or(/BI/BIB/BIBH = '00000981', or(/BI/BIB/BIBH = '00001841', or(/BI/BIB/BIBH = '00002668', or(/BI/BIB/BIBH = '00001949', or (/BI/BIB/BIBH = '00002819', or(/BI/BIB/BIBH = '00000623', or(/BI/BIB/BIBH = '00002365', or(/BI/BIB/BIBH = '00000865', or(/BI/BIB/BIBH = '00000176', or(/BI/BIB/BIBH = '00002036', or(/BI/BIB/BIBH = '00002640', or(/BI/BIB/ BIBH = '00001297', or(/BI/BIB/BIBH = '00000811', or(/BI/BIB/BIBH = '00001594', or(/BI/BIB/BIBH = '00001160', or(/BI/BIB/BIBH = '00000709', or (/BI/BIB/BIBH = '00001908', or(/BI/BIB/BIBH = '00002836', or(/BI/BIB/BIBH = '00001283', or(/BI/BIB/BIBH = '00002946', or(/BI/BIB/BIBH = '00001877', or(/BI/BIB/BIBH = '00002378', or(/BI/BIB/BIBH = '00000474', or(/BI/BIB/ BIBH = '00002710', or(/BI/BIB/BIBH = '00001884', or(/BI/BIB/BIBH = '00000968', or(/BI/BIB/BIBH = '00000926', or(/BI/BIB/BIBH = '00001902', or (/BI/BIB/BIBH = '00001018', or(/BI/BIB/BIBH = '00002989', or(/BI/BIB/BIBH = '00001590', or(/BI/BIB/BIBH = '00001300', or(/BI/BIB/BIBH = '00002754', or(/BI/BIB/BIBH = '00001923', or(/BI/BIB/BIBH = '00002771', or(/BI/BIB/ BIBH = '00000768', or(/BI/BIB/BIBH = '00002034', or(/BI/BIB/BIBH = '00001851', or(/BI/BIB/BIBH = '00001303', or(/BI/BIB/BIBH = '00002642', or (/BI/BIB/BIBH = '00002949', or(/BI/BIB/BIBH = '00000821', or(/BI/BIB/BIBH = '00000989', or(/BI/BIB/BIBH = '00001377', or(/BI/BIB/BIBH = '00001514', or(/BI/BIB/BIBH = '00000510', or(/BI/BIB/BIBH = '00000054', or(/BI/BIB/ BIBH = '00001280', or(/BI/BIB/BIBH = '00002370', or(/BI/BIB/BIBH = '00001924', or(/BI/BIB/BIBH = '00000063', or(/BI/BIB/BIBH = '00001632', or (/BI/BIB/BIBH = '00000434', or(/BI/BIB/BIBH = '00003035', or(/BI/BIB/BIBH = '00001657', or(/BI/BIB/BIBH = '00002053', or(/BI/BIB/BIBH = '00000966', or(/BI/BIB/BIBH = '00002699', or(/BI/BIB/BIBH = '00003178', or(/BI/BIB/ BIBH = '00002602', or(/BI/BIB/BIBH = '00002194', or(/BI/BIB/BIBH = '00001591', or(/BI/BIB/BIBH = '00001910', or(/BI/BIB/BIBH = '00000174', or (/BI/BIB/BIBH = '00002132', /BI/BIB/BIBH = '00000665'))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))""" import time start = time.time() tokens = GRAMMAR.parseString(data) print time.time() - start ---->8------------------------------------------------------------------- I tried with last pyparsing version (1.4.8) with no luck, have you any ideas? The grammar is self-made, I can change it if you think the bad performance is fault of it. |
From: Paul M. <pt...@au...> - 2007-12-09 17:54:45
|
Donn - This really is pretty well-suited to pyparsing, but you still have some basics to learn. - Why did you wrap "fill:#" in an Optional? If anything, it is a Literal, but in your grammar, you can just use the string itself. - parseString is suitable ONLY if you fully specify the grammar for the input string. Since you are trying pick matches out from amongst other noise, searchString or scanString are better choices. scanString returns a generator, which means you have to iterate over it with a for loop, or use something like the list constructor to convert to a list. scanString also returns the start and end locations for each match. In your case, you don't need this extra info, so just use the simpler searchString (searchString is just a wrapper around scanString - it discards the extra data, and just returns a list of the matches). - Your grammar was wrong in a few places. The # sign is a marker for the hex values in fill and stroke only, and is not used inthe fill-opacity or stroke-width commands. Since the # sign goes with the hex values, I included it as a suppressed prefix on hexNums, and removed it from the various command definitions. - Likewise, I defined a COLON as a Suppress(":"), so that the returned values have just the interesting names, with no trailing colons. - With these changes, searchString will return a list of key-value pairs. Note the easy way to change this to a dict, given at the end of the example. In short: - use the correct method for parsing or sifting or transforming data - look more closely at your input string to define your expressions properly Don't give up, this parsing stuff takes some getting used to, AND practice! -- Paul (my modified version, with comments) from pyparsing import * # Cover [1] or [0.587] floatOrInt = Combine(Word(nums) + Optional(Literal(".") + Word(nums))) # Cover any amount of hex, [ab][abcf] hexNums = Word(hexnums) # the #-sign is a prefix for hex numbers in SVG, so make it part of hexNums # instead of repeating it in each label that takes a hex value hexNums = Suppress("#") + Word(hexnums) # A semi-colon seps commands semi = Literal(";").suppress() COLON = Literal(":").suppress() # hacking the commands FILL_command = Optional("fill:#")# + Group(hexNums + semi) # why Optional here? FILL_command = "fill" + COLON + hexNums + semi FILLOPACITY_command = "fill-opacity:#" + floatOrInt + semi FILLOPACITY_command = "fill-opacity" + COLON + floatOrInt + semi STROKECOLOR_command = "stroke:#" + hexNums + semi STROKECOLOR_command = "stroke" + COLON + hexNums + semi STROKEWIDTH_command = "stroke-width:#" + floatOrInt + semi STROKEWIDTH_command = "stroke-width" + COLON + floatOrInt + semi # Trying to sum them up. Remarked down to one for testing stylecommand = FILL_command | FILLOPACITY_command | STROKECOLOR_command | STROKEWIDTH_command # Hacked during tests, tried to simplify. phrase2 = stylecommand #OneOrMore(Group(stylecommand)) #The test string style="opacity:1;color:#000000;fill:#6bdc23;fill-opacity:0.4611111;fill-rule :nonzero;stroke:#ff0000;stroke-width:6;stroke-linecap:butt;stroke-linejoin:m iter;marker:none;marker-start:none;marker-mid:none;marker-end:none;stroke-mi terlimit:2;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;visibi lity:visible;display:inline;overflow:visible;enable-background:accumulate" #~ tokensStyle = phrase2.parseString(style) # I also tried scanString() # scanString returns a generator, do you know how to extract values # from a generator? This is a Python thing, not a pyparsing thing. # If you don't need extra info about each match (like the start and end # locations, just use seachString # parseString is clearly the wrong choice here, since you are picking # out selected matches from among other junk, use searchString is # the simplest tokensStyle = phrase2.searchString(style) print tokensStyle # Trying to get a result. for a in tokensStyle: print a ## for command in a: ## print ":",command # An easy way to convert searchString results to a dict, for this # example (since grammar returns each element as a key-value pair) print dict(tokensStyle.asList()) -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Donn Ingle Sent: Sunday, December 09, 2007 7:03 AM To: pyp...@li... Subject: [Pyparsing] parsing SVG styles Hello again, I have now spent almost 3 hours on this and I've also looked at the examples and read the pdfs, but I just can't get this going. I actually find the docs and the adventure example too complicated -- I'm a simple sort. I'll post my code and hope for mercy -- as I've been told to rtfm before :) I'm trying to "pick-out" certain keywords (and args) from a string (style node in an SVG file) from amidst a babble of noise and just record those for later use. fill:#[6 hex nums]; fill-opacity:#[float or int]; stroke:#[6 hex nums]; stoke-width:#[float or int]; This is my latest test: # Cover [1] or [0.587] floatOrInt = Combine(Word(nums) + Optional(Literal(".") + Word(nums))) # Cover any amount of hex, [ab][abcf] hexNums = Word(hexnums) # A semi-colon seps commands semi = Literal(";").suppress() # hacking the commands FILL_command = Optional("fill:#")# + Group(hexNums + semi) FILLOPACITY_command = "fill-opacity:#" + floatOrInt + semi STROKECOLOR_command = "stroke:#" + hexNums + semi STROKEWIDTH_command = "stroke-width:#" + floatOrInt + semi # Trying to sum them up. Remarked down to one for testing stylecommand = FILL_command# | FILLOPACITY_command | STROKECOLOR_command | STROKEWIDTH_command # Hacked during tests, tried to simplify. phrase2 = stylecommand#OneOrMore(Group(stylecommand)) #The test string style="opacity:1;color:#000000;fill:#6bdc23;fill-opacity:0.4611111;fill-rule :nonzero;stroke:#ff0000;stroke-width:6;stroke-linecap:butt;stroke-linejoin:m iter;marker:none;marker-start:none;marker-mid:none;marker-end:none;stroke-mi terlimit:2;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;visibi lity:visible;display:inline;overflow:visible;enable-background:accumulate" tokensStyle = phrase2.parseString(style) # I also tried scanString() print tokensStyle # Trying to get a result. for a in tokensStyle: print a ## for command in a: ## print ":",command \d ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Donn I. <don...@gm...> - 2007-12-09 13:00:51
|
Hello again, I have now spent almost 3 hours on this and I've also looked at the examples and read the pdfs, but I just can't get this going. I actually find the docs and the adventure example too complicated -- I'm a simple sort. I'll post my code and hope for mercy -- as I've been told to rtfm before :) I'm trying to "pick-out" certain keywords (and args) from a string (style node in an SVG file) from amidst a babble of noise and just record those for later use. fill:#[6 hex nums]; fill-opacity:#[float or int]; stroke:#[6 hex nums]; stoke-width:#[float or int]; This is my latest test: # Cover [1] or [0.587] floatOrInt = Combine(Word(nums) + Optional(Literal(".") + Word(nums))) # Cover any amount of hex, [ab][abcf] hexNums = Word(hexnums) # A semi-colon seps commands semi = Literal(";").suppress() # hacking the commands FILL_command = Optional("fill:#")# + Group(hexNums + semi) FILLOPACITY_command = "fill-opacity:#" + floatOrInt + semi STROKECOLOR_command = "stroke:#" + hexNums + semi STROKEWIDTH_command = "stroke-width:#" + floatOrInt + semi # Trying to sum them up. Remarked down to one for testing stylecommand = FILL_command# | FILLOPACITY_command | STROKECOLOR_command | STROKEWIDTH_command # Hacked during tests, tried to simplify. phrase2 = stylecommand#OneOrMore(Group(stylecommand)) #The test string style="opacity:1;color:#000000;fill:#6bdc23;fill-opacity:0.4611111;fill-rule:nonzero;stroke:#ff0000;stroke-width:6;stroke-linecap:butt;stroke-linejoin:miter;marker:none;marker-start:none;marker-mid:none;marker-end:none;stroke-miterlimit:2;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;visibility:visible;display:inline;overflow:visible;enable-background:accumulate" tokensStyle = phrase2.parseString(style) # I also tried scanString() print tokensStyle # Trying to get a result. for a in tokensStyle: print a ## for command in a: ## print ":",command \d |
From: Paul M. <pt...@au...> - 2007-11-24 16:04:44
|
Since these are supposed to represent the 4 fingers on your hand, can you really have the out-of-order permutations? Or are you really just looking at the 16 values of corresponding to the binary values 0000 to 1111? What is the significance of a symbol meaning "pinky finger on opposite side of index finger"? -- Paul -----Original Message----- From: Ralph Corderoy [mailto:ra...@in...] Sent: Tuesday, November 20, 2007 11:10 AM To: Tim Grove Cc: pyp...@li...; Paul McGuire Subject: Re: [Pyparsing] multiple tokens but no repeats Hi Tim, > I'm trying to represent the logic that I wish to match 0 or more > tokens, up to a maximum of 4 tokens, but I don't want any repeats of > tokens already matched within the group of 4. How could I do that? I'm not sure if you can. Perhaps there's a way to run your own validation code as each attempt to grow the number of matches occurs? Otherwise, since there's only four I'd enumerate the alternatives, e.g. a ab ac ad abc abd acb acd adb adc abcd abdc acbd acdb adbc adcb If I've got that right, there's 16 alternatives starting with `a'. The other three follow a similar pattern making 64 in total. You could factor out the common prefixes to speed up a Regex() a bit if you find that just a dumb list seems slow. Be sure to check whether order determines whether it matches the longest possible, e.g. `abcz' should match `abc' and not `a' or `ab'. Cheers, Ralph. |
From: W. M. B. <de...@de...> - 2007-11-20 17:23:38
|
On Tue, Nov 20, 2007 at 03:24:34PM +0000, Tim Grove wrote: > I'm trying to represent the logic that I wish to match 0 or more tokens, > up to a maximum of 4 tokens, but I don't want any repeats of tokens > already matched within the group of 4. How could I do that? I would try setParseAction(), i.e. match 0 to 4 tokens and check myself for duplication. |
From: Ralph C. <ra...@in...> - 2007-11-20 17:10:24
|
Hi Tim, > I'm trying to represent the logic that I wish to match 0 or more > tokens, up to a maximum of 4 tokens, but I don't want any repeats of > tokens already matched within the group of 4. How could I do that? I'm not sure if you can. Perhaps there's a way to run your own validation code as each attempt to grow the number of matches occurs? Otherwise, since there's only four I'd enumerate the alternatives, e.g. a ab ac ad abc abd acb acd adb adc abcd abdc acbd acdb adbc adcb If I've got that right, there's 16 alternatives starting with `a'. The other three follow a similar pattern making 64 in total. You could factor out the common prefixes to speed up a Regex() a bit if you find that just a dumb list seems slow. Be sure to check whether order determines whether it matches the longest possible, e.g. `abcz' should match `abc' and not `a' or `ab'. Cheers, Ralph. |
From: Tim G. <tim...@si...> - 2007-11-20 15:24:40
|
I'm trying to represent the logic that I wish to match 0 or more tokens, up to a maximum of 4 tokens, but I don't want any repeats of tokens already matched within the group of 4. How could I do that? Thanks. Tim |
From: Tim G. <tim...@si...> - 2007-11-15 12:10:36
|
Hello PyParsers, I'm new to pyparsing and to this list; thanks for this module and for a way to at last get parsing logic out of my head and written down in the form of BNF. Its been driving me crazy!!! The project that I'm working on involves representing signed languages with character strings. Some aspect of a 'sign' is represented by a 1, 2, or maybe 3 character long string, which we refer to as a 'marker'. A complete 'sign' would be represented by a character string in excess of 20 or 30 characters, composed of these individual 'markers'. We have around 300 markers defined. I'm trying to write a parser to tokenize a complete string into these individual markers. I was wondering, if I have a partial sign string, is there a straightforward way to get a listing back of the next possible markers (tokens) ? I'm thinking that I could just pass in a list of all markers and see which ones matched, but there must be a better way! Also, is there a way to represent the logic that I want to match up to 4 characters from a possible 4, but I don't want any repeats? In context, I have: finger ::= , | : | ; | ! I want to be able to match up to a maximum of 4 fingers (not all need to be included), but I don't want any repeats, either adjacent to each other or within the matched group. One solution is as follows, but I was wondering if anyone could suggest a better way: index ::= Literal(",") first ::= Literal(":") middle ::= Literal(";") little ::= Literal("!") hand ::= ([index][first][middle][little])+max=4 (although each individual finger is optional, I need to match at least one finger, up to a mzimum of 4) Thanks for any help or suggestions. Best regards, Tim Grove |
From: Ralph C. <ra...@in...> - 2007-11-11 10:52:57
|
Hi Vineet, > That's a good idea. My current plan is to use part of the following > receipe: > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496746 It might be better from a security point of view to list what's allowed, having carefully considered the implications of each opcode, rather than list what's disallowed. If you miss an opcode by accident, the former will cramp what the user can express, the latter will let them escape your constraints. Cheers, Ralph. |
From: Ralph C. <ra...@in...> - 2007-11-11 10:51:26
|
Hi Vineet, > http://www.fauskes.net/nb/parsing-simulink/ > The above grammar spec requires that all config parameters are > enclosed in one master System {} anything outside this is ignored. I think it doesn't mandate "System" but any mdlName. > Config1 { > Name test > } > > Config2 { > Name test > } > > Only captures Config1 and ignores config2. Any suggestions on how to make it > work without everything having to be enclosed in {} Try changing mdlparser = mdlObject to mdlparser = OneOrMore(mdlObject) Cheers, Ralph. |
From: thomas_h <th...@go...> - 2007-11-10 23:49:59
|
Great, thanks for the help! =Thomas > > Hi Thomas, > > > The reason I tried to build a minimal example in the first place was > > that I had double recursion in my initial version, something like > > > > aexp << Or([..., > > aexp + '+' + aexp, > > ...]) > > > > which results in infinite recursion. I felt this might be the right > > way to do it, but maybe I'm wrong. Is this generally possible in > > pyparsing? > > I doubt it is. If you look at a grammar for a language you tend to find > arithmetical expressions defined: > > expr := term (addop term)* > term := factor (mulop factor)* > factor := floatnum | '(' expr ')' > addop := '+' | '-' > mulop := '*' | '/' > > This makes the grammar, and the parse tree, represent the precedence of > operators. Note the parenthesis are handled with the recursive > definition where `expr' was a Forward() initially. Using ZeroOrMore() > it's quite easy to turn the above into Python. Maybe fourFn.py will > make a little more sense now > > > > And I would also suggest that you try to use the MatchFirst > > > construct instead of Or, especially with recursive expressions: > > > > > > aexp << ( number + '+' + aexp | number ) > > > > I presume this will gain me some run-time efficiency. > > Yes. Or() tries all of them and then returns the longest. MatchFirst() > can give up trying after the first one that matches. > > Cheers, > > > Ralph. > > |
From: Vineet J. \(gmail\) <vin...@gm...> - 2007-11-10 13:51:42
|
>> Have you considered using codeop.py to attempt to compile their Python list and dict code? That's a good idea. My current plan is to use part of the following receipe: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496746 with the followin restrictions unallowed_ast_nodes = [ 'Backquote', 'Exec', 'From', 'Global' 'GenExpr', 'GenExprFor', 'GenExprIf', 'GenExprInner', 'Getattr' 'Import', 'Power', 'TryExcept', 'TryFinally', 'Yield' ] # Deny evaluation of code if it tries to access any of the following builtins: unallowed_builtins = [ '__import__', 'chr', 'apply', 'basestring', 'buffer', 'callable', 'chr', 'classmethod', 'coerce', 'compile', 'complex', 'delattr', 'dir', 'divmod', 'eval', 'execfile', 'file', 'filter', 'frozenset', 'getattr', 'globals', 'hasattr', 'hex', 'id', 'input', 'intern', 'isinstance', 'issubclass', 'locals', 'map', 'object', 'oct', 'open', 'ord', 'pow', 'property', 'range', 'raw_input', 'reduce', 'reload', 'repr', 'reversed', 'round', 'set', 'setattr', 'staticmethod', 'super', 'type', 'unichr', 'unicode', 'vars', 'zip' ] I will also check check for use of * and ** with pyparsing. I will replace both of these with my wrappers around them to make sure that there are no cases for: 20000**11111111111111111111111111111111111 [1]*11111111111111111111111111111111111 etc. I think given this, I should be able to run untrusted code. Thanks, Vineet |
From: Vineet J. \(gmail\) <vin...@gm...> - 2007-11-10 13:32:50
|
Quick update. I ended up using: http://www.fauskes.net/nb/parsing-simulink/ to setup the configuration for my user models. The only change I had to make was that names were not allowed to have underscores in them. I tried to fix that by changing: mdlName = Word('$'+'.'+alphas+nums) to mdlName = Word('$'+'.'+alphas+nums+'_') and it worked. Cool. The above grammar spec requires that all config parameters are enclosed in one master System {} anything outside this is ignored. For example: System { Config1 { Name test } Config2 { Name test } } Works. However, Config1 { Name test } Config2 { Name test } Only captures Config1 and ignores config2. Any suggestions on how to make it work without everything having to be enclosed in {} |
From: Ralph C. <ra...@in...> - 2007-11-10 10:58:01
|
Hi Vineet, > I allow the users of my application to write small rules as python > code. I use pylint to find errors in code they enter. As part of the > user code they are required to enter a list (lista) and a couple of > dicts (dict1, dict2) at the module level. I guess dict1's values can also be dicts, lists, etc. > I use lista, dict1, dict2 to add variables to the module dynamically > at run time. The problem I'm having is that pylint complains of the > dynamic variables that are set in lista and dict1 and dict2. So I was > thinking of having a multiple stage effort to find syntax errors with > the user python code. Have you considered using codeop.py to attempt to compile their Python list and dict code? If you're wary of them doing unwanted stuff in the code they provide then maybe opcode.py can then be used to scan through the compiled bytecode to check they're just doing simple dict and list construction? Cheers, Ralph. |
From: Vineet J. \(gmail\) <vin...@gm...> - 2007-11-09 21:52:19
|
Problem: I allow the users of my application to write small rules as python code. I use pylint to find errors in code they enter. As part of the user code they are required to enter a list (lista) and a couple of dicts (dict1, dict2) at the module level. I use lista, dict1, dict2 to add variables to the module dynamically at run time. The problem I'm having is that pylint complains of the dynamic variables that are set in lista and dict1 and dict2. So I was thinking of having a multiple stage effort to find syntax errors with the user python code. Step 1: Extract the lista, dict1, and dict2 with some pyparsing code. Step 2: Then convert list1, dict1, dict2 to valid python objects using pyparsing json conversion function Step 3: Run pylint on the python user code. Ignore errors for variables and functions defined in lista, dict1, dict2 Example User Python Code: Lista = ["variable1", "variable2"] RuleDict = { 'rule1':{'name1':'function1name', }, 'rule1':{'name1':'function1name', } } def user_logic(): print variable1 #Not error print variable2 #Not error print function1name #Not error print function1name #Not error asdfasjkdfsdkajflasj; #ERROR asfasdfasdf #ERROR How would I do step1 and step2 with pyparsing? I'm going to sign up for the oreily book and do some more reading on pyparsing this weekend, but any work that I can get from anyone on this list would be very helpful. Thanks, Vineet |
From: Ralph C. <ra...@in...> - 2007-11-09 18:08:00
|
Hi Thomas, > The reason I tried to build a minimal example in the first place was > that I had double recursion in my initial version, something like > > aexp << Or([..., > aexp + '+' + aexp, > ...]) > > which results in infinite recursion. I felt this might be the right > way to do it, but maybe I'm wrong. Is this generally possible in > pyparsing? I doubt it is. If you look at a grammar for a language you tend to find arithmetical expressions defined: expr := term (addop term)* term := factor (mulop factor)* factor := floatnum | '(' expr ')' addop := '+' | '-' mulop := '*' | '/' This makes the grammar, and the parse tree, represent the precedence of operators. Note the parenthesis are handled with the recursive definition where `expr' was a Forward() initially. Using ZeroOrMore() it's quite easy to turn the above into Python. Maybe fourFn.py will make a little more sense now > > And I would also suggest that you try to use the MatchFirst > > construct instead of Or, especially with recursive expressions: > > > > aexp << ( number + '+' + aexp | number ) > > I presume this will gain me some run-time efficiency. Yes. Or() tries all of them and then returns the longest. MatchFirst() can give up trying after the first one that matches. Cheers, Ralph. |
From: thomas_h <th...@go...> - 2007-11-09 13:56:24
|
Thanks for the fast answers! > and your recursion wont recurse properly. Instead, you must use the '<<' > operator. In your example, you just need to replace: Ah, silly me. I had the '<<' operator in my original code, but forgot it when I tried to stip it down to a minimal example ... Yes, now it works :-). The reason I tried to build a minimal example in the first place was that I had double recursion in my initial version, something like aexp << Or([..., aexp + '+' + aexp, ...]) which results in infinite recursion. I felt this might be the right way to do it, but maybe I'm wrong. Is this generally possible in pyparsing? > While this all works, I'm curious why you are not using the operators > defined for creating compound constructs such as Or. The form you have > feels very tedious to me, compared to: > > aexp << ( number ^ number + '+' + aexp ) I tried the operators superficially and got error messages from Python, but that was probably my fault (e.g. not importing enough). I will improve on that :-). > And I would also suggest that you try to use the MatchFirst construct > instead of Or, especially with recursive expressions: > > aexp << ( number + '+' + aexp | number ) I presume this will gain me some run-time efficiency. > Note that I had to reorder the terms in order to try the more restrictive > test first. But MatchFirst will stop at the first matching alternative, > while Or will evaluate all alternatives and select the longest. In > recursive expressions, Or can descend down a neverending sequence of > self-referencing alternatives. I will keep this in mind, although I might stick with Or for the beginning and use MatchFirst as an optimization later on (my target is parsing JavaScript). > > Welcome to pyparsing! Thanks :-). Apart from my beginner's problems I've got the impression that it is really good stuff. Thanks for making the effort! =Thomas |