pyparsing-users Mailing List for Python parsing module (Page 13)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: spir ☣ <den...@gm...> - 2010-03-25 10:21:45
|
On Wed, 24 Mar 2010 15:57:34 -0600 "ThanhVu (Vu) Nguyen" <ngu...@gm...> wrote: > Hi, I tried to generate this simple recursive rule that involves both > Forward and operatorPrecedence() and get errors about maximum > recursion depth exceeded . > > Thanks, > > > > def getRule_test(): > #rule exp = name | num | name[exp] | exp + exp | exp * exp | > name = Word(alphas) > num = Word(nums) > > exp = Forward() > idx=name + '[' + exp + ']' > > arith = operatorPrecedence( > exp,[('*',2,opAssoc.LEFT), > ('+',2,opAssoc.RIGHT)],) > > exp << (arith|idx|name|num) #works ok if take out arith > return exp > > VN - Yes, the recursive term "exp" in your format appears on the left side of arith, and thus finally on the left side of itself. This cannot cannot be matched since it lauches an infinite recursive loop of call to exp.match(). More generally, you cannot write a pattern such as: p1 : p1 whatever But it can always be reformulated into something like: p2 : whatever p2 Here, you need to distinguish between the levels of a non-recursive operand (inside arith) and of a whole exp. Operator precedence already involves the inherent recursivity of operations. Exp only needs be recursive because it appears inside idx. (I guess.) Something like, maybe (untested!): #rule exp = name | num | name[exp] | exp + exp | exp * exp | name = Word(alphas) num = Word(nums) exp = Forward() idx=name + '[' + exp + ']' operand = (idx|name|num) arith = operatorPrecedence( operand,[('*',2,opAssoc.LEFT), ('+',2,opAssoc.RIGHT)],) exp << (arith|operand) #works ok if take out arith return exp Denis ________________________________ vit esse estrany ☣ spir.wikidot.com |
From: ThanhVu (V. N. <ngu...@gm...> - 2010-03-24 21:58:01
|
Hi, I tried to generate this simple recursive rule that involves both Forward and operatorPrecedence() and get errors about maximum recursion depth exceeded . Thanks, def getRule_test(): #rule exp = name | num | name[exp] | exp + exp | exp * exp | name = Word(alphas) num = Word(nums) exp = Forward() idx=name + '[' + exp + ']' arith = operatorPrecedence( exp,[('*',2,opAssoc.LEFT), ('+',2,opAssoc.RIGHT)],) exp << (arith|idx|name|num) #works ok if take out arith return exp VN - |
From: Shankar H. <sha...@ya...> - 2010-03-19 19:51:32
|
Hi All, I am shankar, just started using the pyparsing module for parsing the C++ header file, currently i am facing one problem with Optional() as below sample Code : Optional(functionSpecifierName) + Optional(typeDef) + identifier here i specified "functionSpecifierName" & "typeDef" are optional, when i give the input to the above code as only the identifier data then it is not parsing the data even though i specified "functionSpecifierName" & "typeDef" as optional Thanks -Shankar The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/ |
From: Paul M. <pt...@au...> - 2010-03-15 12:45:10
|
Thanks for posting this, Donn - I'll look at it further this evening when I get home from work. -- Paul > -----Original Message----- > From: donn [mailto:don...@gm...] > Sent: Monday, March 15, 2010 4:46 AM > To: pyp...@li... > Subject: Re: [Pyparsing] Adding to the wiki > > On 14/03/2010 18:24, Paul McGuire wrote: > > I just added the page "UserContributions", try putting it there. Follow > the > > example formats used in "UnderDevelopment", or if the code is very long, > the > > format used on the Examples page. > > > Thanks. I have added my code: > http://pyparsing.wikispaces.com/UserContributions > > All crits/improvements more than welcome :) > > \d > > > -------------------------------------------------------------------------- > ---- > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: donn <don...@gm...> - 2010-03-15 09:45:25
|
On 14/03/2010 18:24, Paul McGuire wrote: > I just added the page "UserContributions", try putting it there. Follow the > example formats used in "UnderDevelopment", or if the code is very long, the > format used on the Examples page. > Thanks. I have added my code: http://pyparsing.wikispaces.com/UserContributions All crits/improvements more than welcome :) \d |
From: Paul M. <pt...@au...> - 2010-03-14 16:24:34
|
I just added the page "UserContributions", try putting it there. Follow the example formats used in "UnderDevelopment", or if the code is very long, the format used on the Examples page. -- Paul > -----Original Message----- > From: donn [mailto:don...@gm...] > Sent: Saturday, March 13, 2010 10:32 AM > To: pyp...@li... > Subject: [Pyparsing] Adding to the wiki > > Hi, > Before I attempt it, I'm asking where in the wiki would be the right > place to add the code I am developing to parse SVG? I will post what I > have and then refine it as I go. It's all going to be GPL 3. > > \d > > > -------------------------------------------------------------------------- > ---- > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: donn <don...@gm...> - 2010-03-13 16:32:20
|
Hi, Before I attempt it, I'm asking where in the wiki would be the right place to add the code I am developing to parse SVG? I will post what I have and then refine it as I go. It's all going to be GPL 3. \d |
From: donn <don...@gm...> - 2010-03-07 17:22:55
|
On 07/03/2010 19:13, Paul McGuire wrote: > See the embedded comments in this code: Paul, You save me annually! And so many new tricks. PP.delimitedList! Whodathunkit? :D Thanks once again. I am working on extending my previous SVG parser to be far more robust. I shall certainly post the source on the Pyparsing wiki when it's done. Now, lemme go hack the the code you sent. Best, \d -- Fonty Python and Things! -- http://otherwise.relics.co.za/wiki/Software |
From: Paul M. <pt...@au...> - 2010-03-07 17:13:43
|
Donn - See the embedded comments in this code: import pyparsing as PP # just get all the punctuation out of the way dot,comma,open_bracket,close_bracket = map(PP.Suppress,".;()") # I'm finding that complex items like real numbers just work better # using a Regex than Combine'ing Words, Optionals, etc. floater = PP.Regex(r"-?\d+(\.\d*)?([Ee][+-]?\d+)?") floater.setParseAction(lambda toks:float(toks[0])) # define a permissive expression for rotate_command, then do additional # validation in the parse action rotate_command = "rotate" + \ open_bracket + PP.Optional(PP.delimitedList(floater))("args") + \ close_bracket def validate_rotate_args(s,l,tokens): numargs = len(tokens.args) if not(numargs == 1 or numargs == 3): raise PP.ParseException(s,l,"wrong number of args") # convert generic 'args' name to specific field names tokens["angle"] = tokens.args[0] tokens["pivot"] = tokens.args[1:3] del tokens["args"] rotate_command.setParseAction(validate_rotate_args) >From here, you could use scanString or searchString to return just the valid commands: for rotation in rotate_command.searchString(s): print rotation.dump() Gives: ['rotate', 5.0, 5.0, 6.0] - angle: 5.0 - pivot: [5.0, 6.0] ['rotate', 7.0] - angle: 7.0 - pivot: [] The parseAction has detected that "rotate(99,7)" has the wrong number of arguments. There's not really any way to emit a warning while parsing - either the parse matches, or it raises a ParseException. -- Paul |
From: donn <don...@gm...> - 2010-03-07 15:51:19
|
Hi, How would I get PP to *continue* parsing a string, skipping over any mismatched (badly formed) patterns? Raising an exception that I can then continue from would be a bonus. Here's my best effort so far to demo the thing: ----- import pyparsing as PP dot = PP.Literal(".") comma = PP.Literal(",").suppress() ## A single digit like 0 really means 0.0 so the .0 is optional: numberMaybeDotnumber = PP.Combine( PP.Word(PP.nums) + PP.Optional( dot + PP.Word(PP.nums) ) ) ## I have seen Inkscape include E- in the string... e.g 0.8683224E-07, hence all the gumph in floater. floater = PP.Combine(PP.Optional("-") + numberMaybeDotnumber + PP.Optional( PP.Literal("E") + PP.Literal("-") + PP.Word(PP.nums) ) ) floater.setParseAction(lambda toks:float(toks[0])) open_bracket=PP.Literal("(").suppress() close_bracket = PP.Literal(")").suppress() opt_couple = PP.Optional( comma + floater + comma + floater ) ## rotate( angle (,x,y) ) where ,x,y are optional. rotate_command = "rotate" + open_bracket + PP.Group( floater + opt_couple ) + close_bracket matrix_commands = rotate_command phrase_matrix = PP.OneOrMore(PP.Group(matrix_commands)) ## right wrong right ## PP stops here ## with no error. ## I want this too s="rotate(5,5,6) rotate(99,7) rotate(7)" print phrase_matrix.parseString(s) ## swapped wrong/right at end to test. s="rotate(5,5,6) rotate(99) rotate(7,7)" print phrase_matrix.parseString(s) ----- Output: [['rotate', [5.0, 5.0, 6.0]]] [['rotate', [5.0, 5.0, 6.0]], ['rotate', [99.0]]] rotate( arg1 ) is okay. rotate( arg1, arg2, arg3) is okay. rotate( arg1,arg2 ) is wrong. I want to skip over that like it was never there (or raise an warning). (same for rotate( ..empty.. ) and other mad patterns.) Holding thumbs, \d |
From: Mark L. <bre...@ya...> - 2010-03-05 12:55:30
|
Thanks for the response Paul. I thought that this would be the case having done a fair bit of research before posting, but nothing ventured... The data at the link looks very useful, I'll have a play later on today. I'll also keep in mind your advice on the low level tokens, might well save me some grief. If I do manage to get anywhere I'll let you know, maybe another flag to fly for pyparsing. Kindest regards. Mark Lawrence. Paul McGuire wrote: > Sorry Mark, I know of no one who has tackled a full Java grammar in > pyparsing. Here is one link to a Java BNF: > http://www.daimi.au.dk/dRegAut/JavaBNF.html - having done the Verilog > grammar, this looks like a comparable task. It took me about 8 weeks to do > Verilog. > > If you do take this on, or a subset (or start a little Google Code project > for it), one recommendation I would make would be to use a Regex for some of > the complex low-level tokens, like all the numeric literals. Building them > up with Word, Optional, etc. will run waaay too slow, and using a Regex is a > reasonable compromise. > > -- Paul > > >> -----Original Message----- >> From: Mark Lawrence [mailto:bre...@ya...] >> Sent: Thursday, March 04, 2010 9:55 AM >> To: pyp...@li... >> Subject: [Pyparsing] Java grammar >> >> Hi all, >> >> Just assume that there is some purely hypothetical raving lunatic who >> thinks it is a good idea to use pyparsing to convert Java source code into >> the far purer Python. Rather than reinvent wheels this PHRL hopes to >> reuse an existing pyparsing Java grammar for the job. Does anybody know >> of such a beast? >> >> Should said beast not exist, would it be possible to grab an existing >> grammar from the web and convert it using ebnf.py? If no it would not >> surprise this PHRL as surely the entire concept is simply far to good to >> be true. If yes, yee hah!!!, but how the hell does this PHRL go about >> this task? >> >> Any and all responses gratefully received. >> >> Mark Lawrence. >> >> >> >> >> -------------------------------------------------------------------------- >> ---- >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> Pyparsing-users mailing list >> Pyp...@li... >> https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev |
From: Paul M. <pt...@au...> - 2010-03-05 03:26:26
|
Sorry Mark, I know of no one who has tackled a full Java grammar in pyparsing. Here is one link to a Java BNF: http://www.daimi.au.dk/dRegAut/JavaBNF.html - having done the Verilog grammar, this looks like a comparable task. It took me about 8 weeks to do Verilog. If you do take this on, or a subset (or start a little Google Code project for it), one recommendation I would make would be to use a Regex for some of the complex low-level tokens, like all the numeric literals. Building them up with Word, Optional, etc. will run waaay too slow, and using a Regex is a reasonable compromise. -- Paul > -----Original Message----- > From: Mark Lawrence [mailto:bre...@ya...] > Sent: Thursday, March 04, 2010 9:55 AM > To: pyp...@li... > Subject: [Pyparsing] Java grammar > > Hi all, > > Just assume that there is some purely hypothetical raving lunatic who > thinks it is a good idea to use pyparsing to convert Java source code into > the far purer Python. Rather than reinvent wheels this PHRL hopes to > reuse an existing pyparsing Java grammar for the job. Does anybody know > of such a beast? > > Should said beast not exist, would it be possible to grab an existing > grammar from the web and convert it using ebnf.py? If no it would not > surprise this PHRL as surely the entire concept is simply far to good to > be true. If yes, yee hah!!!, but how the hell does this PHRL go about > this task? > > Any and all responses gratefully received. > > Mark Lawrence. > > > > > -------------------------------------------------------------------------- > ---- > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Mark L. <bre...@ya...> - 2010-03-04 15:55:28
|
Hi all, Just assume that there is some purely hypothetical raving lunatic who thinks it is a good idea to use pyparsing to convert Java source code into the far purer Python. Rather than reinvent wheels this PHRL hopes to reuse an existing pyparsing Java grammar for the job. Does anybody know of such a beast? Should said beast not exist, would it be possible to grab an existing grammar from the web and convert it using ebnf.py? If no it would not surprise this PHRL as surely the entire concept is simply far to good to be true. If yes, yee hah!!!, but how the hell does this PHRL go about this task? Any and all responses gratefully received. Mark Lawrence. |
From: Alexander B. <ale...@gm...> - 2010-01-08 09:59:23
|
Hi again I found a solution, when we poped from stack an index (number with {}) than we need to pop next operand from stack - so we have variable name and index and can analyse this values. Thanks -- Alexander Bruy mailto: ale...@gm... |
From: Alexander B. <ale...@gm...> - 2010-01-08 09:37:06
|
Hi list I need to parse and evaluate algebraic expressions with variables. Some of variables is simply variable name which replaced with it value when evaluating, but some variables is complex variable with index part (like array indexing). For example variable{1} For this kind of variables I need to do additional actions (based on variable name and index) before evaluating. For example, I have variable "layer", when this variable found without index part I need to return one value, when index is 2 (in expression this looks as layer{2}) I need to return second value, index is 1 (in expression this looks as layer{1}) I need to return third value. Is it possible with pyparsing? I look at manuals and example SimpleCalc.py and this looks very similar for what I want. I think, that index part must be described with Optional class. But I can't understand what I must add to the SimpleCalc.py for parsing and evaluating variables with indexes. Can anyone help me with this? Thanks and sorry for my bad English, I'm Ukrainian -- Alexander Bruy mailto: ale...@gm... |
From: Philipp R. <phi...@gm...> - 2009-11-20 20:46:51
|
Am Mon, 16 Nov 2009 18:06:31 +0100 schrieb spir: > Here is the tool. Try it first on various typical substrings of your > source. If works as expected, should be a major boost (and simplication > of your grammar as well). Thanks. I rewrote it to support variable-length indentation using an indent stack approach, as I apparently have quite a bit of data in formats such as this: 1 2 3 3 3 2 That seems to be just as stable and catches the indentation at level 0 thing as well. The grammar is more or less the same, replacing indentedBlock() instances with OPEN + OneOrMore + CLOSE instances, but parsing times are orders of magnitude better. Thanks, Philipp |
From: spir <den...@fr...> - 2009-11-17 08:38:32
|
Le Mon, 16 Nov 2009 15:58:35 +0100, Philipp Reichmuth <phi...@gm...> stated: > Am Sun, 15 Nov 2009 20:37:59 +0100 schrieb spir: > > May be useful: I do not parse anymore indented structure, instead > > systematically preprocess to transform it into delimited structure (say, > > C style). The reason is complication of the grammar and > > state-dependance. > > I see the point. I'll think if I can preprocess the source text to avoid > using indentedBlock(). > > > I have a pair of tool funcs that "transcode" in both directions (can send > > if you like). It's easy as long as you can rely on indentation to be > > consistent (which is not necessary true in eg python code). > > If you could send me those, I'd be grateful. From what I've seen so far, > indentation seems to be fairly consistent. > I have some cases that look like > this: > > entity 1... > @relation 1... > entity 2... > @relation 3... > @relation 4... > @relation 5... > > But those should be easy to catch. How should it be? (what should be indented in respect to what?) > The problem seems indeed to be the combination of indentedBlock() and > recursion - indentedBlock() currently uses a lookahead mechanism that seems > to lead to exponential branching in the parse tree under some conditions. > > Philipp Here is the tool. Try it first on various typical substrings of your source. If works as expected, should be a major boost (and simplication of your grammar as well). (Note: the funcs expects indent level 0 at start of source -- just realize this now.) Denis =================================================== ### indented <--> wrapped structure # tool funcs def howManyAtStart(text, string): ''' how many times a (sub)string appears at start of text ''' pos = 0 n = 0 length = len(string) while text[pos:].startswith(string): pos += length n += 1 return n def indentMark(lines): ''' find & return indentation mark ~ either TAB or n spaces ~ must be consistent ''' for line in lines: if line.strip() == '': continue if line[0] == TAB: return TAB n = howManyAtStart(line, SPC) if n > 0: return n * SPC return None def WrapIndentedStructure( source, INDENT=None, OPEN="{\n", CLOSE="}\n", keepIndent=False ): ''' Transform indented to wrapped structure. ~ Indentation must be consistent! ~ If INDENT not given, set to the first start-of-line whitespace. ~ Indentation can be kept: nicer & more legible result but needs to be coped with during parsing. ~ Blank lines are ignored & left as is (else problematic). ''' level = 0 # current indent level # add artificial EOFile marker source += EOF + EOL lines = source.splitlines() # find 'INDENT' indentation mark if not given if INDENT is None: INDENT = indentMark(lines) # case no indent at all in source if INDENT is None: return source # find indent level *changes* & replace them with tokens result = "" length = len(INDENT) for (i,line) in enumerate(lines): # skip blank line if line.strip() == '': if keepIndent: result += level*INDENT + EOL else: result += EOL continue # get offset: difference of indentation if line == EOF: line = '' offset = howManyAtStart(line, INDENT) - level # case no indent level change if offset == 0: result += line + EOL # case indent level increment (+1) elif offset == 1: level += 1 open_mark = (INDENT*level + OPEN) if keepIndent else OPEN if not keepIndent: line = line[length:] result += open_mark + line + EOL # case indent level decrement (<= current level) elif offset < 0: offset = -offset level -= offset if keepIndent: close_marks = "" for n in range(level+offset, level, -1): close_marks += (n*INDENT + CLOSE) else: close_marks = offset * CLOSE line = line[offset*length:] result += close_marks + line + EOL else: # case indent level inconsistency (increment > 1) message = "Inconsistent indentation at line #%s" \ " (increment > 1):\n%s" % (i,line) raise ValueError(message) return result def IndentWrappedStructure(source, INDENT=' ', open="{",close="}"): ''' Transform wrapped to indented structure. ~ Wrapping must be consistent! ~ open/close tokens must be on their own line! ''' EOL = '\n' result = "" (pos,level) = (0,0) # current pos in text & indentation level lines = source.splitlines() for (i,line) in enumerate(lines): # case open if line.strip() == open: level += 1 # case close elif line.strip() == close: if level == 0: message = "Inconsistent indentation at line #%s" \ " (decrement under zero):\n%s" % (i,line) raise ValueError(message) level -= 1 # else record line with proper indentation else: result += level*INDENT + line.lstrip() + EOL return result ####### test ####### def testWrapIndent(): # erroneous example source = """\ 0 0 1 3 2 1 0 """ print "\n=== wrap indented blocks (erroneous case) in source:\n%s\n"\ % (source) try: print WrapIndentedStructure(source, INDENT=None, keepIndent=True) except ValueError,e: print e # correct example source = """\ 0 0 1 1 2 2 3 3 4 5 6 3 3 1 1 0 0 """ print "\n=== wrap indented blocks (keeping indent) in source:\n%s\n"\ % (source) result= WrapIndentedStructure(source, keepIndent=True) print result print "\n=== reindent same source" print IndentWrappedStructure(result) def test(): #~ testNormalize() #~ print RULER testWrapIndent() =================================================== -------------------------------- * la vita e estrany * http://spir.wikidot.com/ |
From: Philipp R. <phi...@gm...> - 2009-11-16 14:59:14
|
Am Sun, 15 Nov 2009 20:37:59 +0100 schrieb spir: > May be useful: I do not parse anymore indented structure, instead > systematically preprocess to transform it into delimited structure (say, > C style). The reason is complication of the grammar and > state-dependance. I see the point. I'll think if I can preprocess the source text to avoid using indentedBlock(). > I have a pair of tool funcs that "transcode" in both directions (can send > if you like). It's easy as long as you can rely on indentation to be > consistent (which is not necessary true in eg python code). If you could send me those, I'd be grateful. From what I've seen so far, indentation seems to be fairly consistent. I have some cases that look like this: entity 1... @relation 1... entity 2... @relation 3... @relation 4... @relation 5... But those should be easy to catch. The problem seems indeed to be the combination of indentedBlock() and recursion - indentedBlock() currently uses a lookahead mechanism that seems to lead to exponential branching in the parse tree under some conditions. Philipp |
From: Philipp R. <phi...@gm...> - 2009-11-16 14:53:46
|
Am Mon, 16 Nov 2009 02:02:01 -0600 schrieb Paul McGuire: > You have so much going on in this grammar, have you thought of using > something a little simpler, but just as self-descriptive, like JSON perhaps? It's a legacy format, unfortunately. I actually worked through most of the indented examples already. I'm also somewhat new to pyparsing, been doing some tricky things with it, but not to the point where I feel really confident. I was kinda hoping for some pointers to tweak the existing grammar, but that seems to be difficult. Either way you now have a test case for indentedBlock() breaking Packrat, and for those indentedBlock() recursion problems that people were reporting some time ago ;) Philipp |
From: Paul M. <pt...@au...> - 2009-11-16 08:02:13
|
> I have a problem with indentedBlock() and a relatively simple grammar that > relies on indentation and recursion. I'm seeing extremely long execution > times. On one example that I've attached it doesn't terminate within an > hour. > Yikes! If this is a simple grammar, you do some crazy parsing! You have so much going on in this grammar, have you thought of using something a little simpler, but just as self-descriptive, like JSON perhaps? I know it uses delimiters instead of indentation, but pyparsing seems to be really struggling with this input text. You might also look at some of the indented examples on the pyparsing wiki, and roll your own version of indentedBlock, to see if you can make some better headway. -- Paul |
From: spir <den...@fr...> - 2009-11-15 19:38:13
|
Le Sun, 15 Nov 2009 17:51:00 +0100, Philipp Reichmuth <phi...@gm...> s'exprima ainsi: > I have a problem with indentedBlock() and a relatively simple grammar that > relies on indentation and recursion. I'm seeing extremely long execution > times. On one example that I've attached it doesn't terminate within an > hour. May be useful: I do not parse anymore indented structure, instead systematically preprocess to transform it into delimited structure (say, C style). The reason is complication of the grammar and state-dependance. I have a pair of tool funcs that "transcode" in both directions (can send if you like). It's easy as long as you can rely on indentation to be consistent (which is not necessary true in eg python code). So, 1 1 2 2 3 2 ==> 1 1 { 2 2 { 3 } 2 } or 1 1 { 2 2 { 3 } 2 } Just need to find delimiters that cannot clash against possible content. (I chose to have delimiters on their own line for this reason and also cause it's easier to check visually, maybe also to parse, I guess.) But think that your performance issues may come in great part from recursive patterns, not only from indented structure. Denis -------------------------------- * la vita e estrany * http://spir.wikidot.com/ |
From: Philipp R. <phi...@gm...> - 2009-11-15 16:55:22
|
Hi guys, I have a problem with indentedBlock() and a relatively simple grammar that relies on indentation and recursion. I'm seeing extremely long execution times. On one example that I've attached it doesn't terminate within an hour. The trick suggested elswehere with commenting out a FollowedBy statement in Pyparsing (line 3633) doesn't work, as it breaks the grammar. Packrat parsing doesn't work either; it appears to be incompatible with indentedBlock(). I've attached a test case that gets broken by enabling Packrat. I'm using Pyparsing 1.5.2 on Python 2.6.4. I'd be extremely grateful for hints how to redesign my grammar so that I get more controlled execution times. The grammar itself isn't really intricate. It's for an obscure data representation format for entity-relationship data. It works around nested entity and relationship declarations. Entity-relationship dependencies are marked by indentation. Entities are declared by entity names followed by a number. In addition, URI references can appear as entities, either inline in angle brackets, or using a one-time external URI declaration. The grammar is fairly self-explanatory; here's the basics in BNF: entity-reference: (entity-name number) | (<inlineURI>) external-URI-decl: entity-reference = <inlineURI> entity-declaration: entity-reference ["parameter"] [par_n:value_n]+ inline-relation: @relation-name [par_n:value_n] entity-reference multiline-relation: @relation_name [par_n:value_n] NEWLINE INDENT entity-declaration-block UNDENT entity-declaration-block: entity-declaration NEWLINE INDENT [inline-relation|multiline-relation]* UNDENT relationClause: entity-reference [inline-relation|multiline-relation]+ suite: [entity-declaration-block | relationClause | external-URI-decl]+ I've attached a (simple but rather verbose) Pyparsing version of the grammar, as well as a couple of test cases. Execution time seems to go up extremaly fast as things become more indented. Thanks, Philipp # ================= GRAMMAR DEFINITION ==================== import pyparsing as pp #pp.ParserElement.enablePackrat() indentStack = [1] colon = pp.Literal(u":") equal = pp.Literal(u"=") relMarker = pp.Literal(u"@") srelMarker = pp.Literal(u"$") subrelMarker = pp.Literal(u"_") entityName = pp.Word(pp.alphas) entityNumber = pp.Word(pp.nums).setParseAction(lambda t: int(t[0])) entityRef = pp.Group(pp.Combine(entityName + entityNumber) ).setResultsName("entityRef") inlineURI = pp.Group(pp.QuotedString(u'<', escQuote=None, multiline=False, endQuoteChar=u">")).setResultsName("inlineURI") externalURI = pp.Group( entityRef + equal.suppress() + inlineURI ).setResultsName("externalURI") relationName = pp.Word(pp.alphanums+"-") relationRef = pp.Combine(relMarker + relationName) superrelationRef = pp.Combine(srelMarker + relationName) subrelationRef = pp.Combine(subrelMarker + relationName) parameterName = pp.Word(pp.alphanums+"-").setResultsName("parameterName") parValueQuote = pp.QuotedString( u'"',u"\\",escQuote=None, multiline=True ).setResultsName("parameterValue") parValueDirect = pp.Word( pp.alphanums+"-_" ).setResultsName("parameterValue") parameterStmt = pp.Group( parameterName + colon.suppress() + (parValueQuote | parValueDirect) ).setResultsName("parameterStmt") parameterBlock = pp.Group( pp.OneOrMore( parameterStmt ) ).setResultsName("parameterBlock") superparameter = pp.QuotedString( u'"',u"\\",escQuote=None, multiline=True ).setResultsName("superparameter") entityDecl = pp.Group( entityRef + (parameterBlock | superparameter + pp.Optional(parameterBlock)) ).setResultsName("entityDecl") entityDeclBlock = pp.Forward() relationInline = pp.Group( relationRef + pp.Optional(parameterBlock) + (inlineURI | entityRef) ).setResultsName("relationInline") relationMultiline = pp.Group( relationRef + pp.Optional(parameterBlock) + pp.indentedBlock( entityDeclBlock, indentStack).setResultsName("indentedBlock") ).setResultsName("relationMultiline") entityDeclBlock << pp.Group( entityDecl + pp.Optional ( pp.indentedBlock( relationMultiline | relationInline, indentStack).setResultsName("indentedBlock") ) ).setResultsName("entityDeclBlock") relationClause = pp.Group( (inlineURI | entityRef) + (relationInline | relationMultiline # ) ).setResultsName("relationClause") suite = pp.OneOrMore ( relationClause | externalURI | entityDeclBlock ).setResultsName("suite").setDebug() # ================= TEST CASES ==================== TESTCASES = { "basic": u"""\ A1 par:value @rel relpar:value A2 @rel A3 entpar:value @rel relpar:value A4 entpar:value @rel <some:URI/reference> A2=<some:URI/reference> """, "packrat": u"""\ C1 id:"mutawalli" type:"appointment" @refersToAppointmentTarget <waqf:ent/office/d408:d408-mutawalli> @regulatesSuccession AO1 id:"muezzinAppointment" @hasAppointee <waqf:ent/office/world:muezzin-ayyub> C2 id:"revenue" type:"revenue" @refersToIncome <waqf:ent/payable/d408:waqfIncome> @regulatesDistribution D1 id:"distStart" @hasChildNode DF1 id:"haqq at-tawliya" denom:"10" num:"1" @hasRecipient <waqf:ent/office/d408:d408-mutawalli> DR1 id:"remaining90percent" @hasRecipient <waqf:ent/office/d408:d408-mutawalli> @inExchangeFor <waqf:ent/duty/d408:d408-light> @inExchangeFor <waqf:ent/duty/d408:d408-furniture>""", "slow": u"""\ DT1 id:"dist-taxation" tax:"xaraj" @hasChildNode D1 id:"dist-repair" @hasRecipient <waqf:ent/office/d405:mutawalli-405> @inExchangeFor X1 id:"graverepair" type:"repair" @dutyTarget <waqf:ent/structure/d405:grave-waqif> @hasChildNode DR1 id:"dist-remainder" @hasChildNode DF1 id:"dist-haqqattawliya" denom:"10" num:"1" @hasRecipient <waqf:ent/office/d405:mutawalli-405> @hasChildNode DR2 id:"dist-remainder2" @hasChildNode DF2 id:"dist-imam" denom:"3" num:"2" @hasRecipient <waqf:ent/office/world:imam-ayyub> @inExchangeFor X2 id:"prayer-imam" type:"prayer" @dutyTarget Y1 "text" id:"pid1" schedule:"sat" type:"m" @hasChildNode DF3 id:"dist-muezzin" denom:"3" num:"1" @hasRecipient <waqf:ent/office/world:muezzin-ayyub> @inExchangeFor X2 id:"prayer-muezzin" type:"prayer" @dutyTarget Y2 "text" id:"pid2" schedule:"fri" type:"m" @hasChildNode DR3 id:"dist-fuqara-symbolic" @hasRecipient <waqf:ent/group/world:fuqara> """} suite.parseString(TESTCASES["basic"], parseAll = True) # Packrat breaks the following suite.parseString(TESTCASES["packrat"], parseAll = True) # The following takes extremely long suite.parseString(TESTCASES["slow"], parseAll = True) |
From: Ralph C. <ra...@in...> - 2009-10-30 11:59:53
|
Hi Paul, > Integer form is pretty clear cut, could be added without much pain. > But should the floats include scientific notation too? Leading zero > required for floats < 1? At least one zero required after the '.'? I'd suggest adding some that match what's allowed by specific languages, e.g. C, Python. So you'd have {c,python}_{dec_,oct_,hex_,}integer, etc. Most times, a user either wants to match text from an existing language or would be happy to conform with what an existing language allows for their own languages. Cheers, Ralph. |
From: <pt...@au...> - 2009-10-30 02:18:11
|
---- Daniel Erenrich <ere...@ca...> wrote: > When debugging pyparsing I love to use the setDebug function. Sometimes > though this becomes tedious since I do not really know what path the > parser took before blowing up. Is there any easy way to setDebug() on > all parsing elements that I defined? Or just all named parsing elements? > Here's a short cut I use some times. Say I have a parser with expressions in it that I want to name and set debug on, and the expressions are in variables ident, integer, float_, and phonenum: for ename in "ident integer float_ phonenum".split(): expr = locals()[ename] expr.setName(ename) expr.setDebug() I first got this idea from Seo Sanghyeon, who used something like this in the ebnf.py EBNF parser, which you can find on the pyparsing wiki's Examples page. > On an unrelated note, why are there no built-ins for matching > integers/floats. This seems like a very common task that I keep reinventing. > Hmm, no special reason. I think I have some already in the Helpful Expressions page of the wiki. Integer form is pretty clear cut, could be added without much pain. But should the floats include scientific notation too? Leading zero required for floats < 1? At least one zero required after the '.'? If these are for your own personal consumption only, you might just drop them into a pyparsing_snippets.py module in your own site-packages directory. But if there is more feedback on the list asking for these, I don't mind adding them. -- Paul |
From: Daniel E. <ere...@ca...> - 2009-10-28 21:08:11
|
When debugging pyparsing I love to use the setDebug function. Sometimes though this becomes tedious since I do not really know what path the parser took before blowing up. Is there any easy way to setDebug() on all parsing elements that I defined? Or just all named parsing elements? On an unrelated note, why are there no built-ins for matching integers/floats. This seems like a very common task that I keep reinventing. Daniel |