pyparsing-users Mailing List for Python parsing module (Page 14)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Paul M. <pt...@au...> - 2009-09-27 12:42:56
|
As I've read more about Visitors, I think this pattern may actually be more applicable to the results that might be returned from the parsing process. In the case of a parser that returns some structured tokens, a Visitor might be created that walks this list, pretty-printing the output, or evaluating some generated expression execution objects. Unfortunately, pyparsing has little control over what kinds of objects will be found. The overall object returned is always a ParseResults object, but the items within the ParseResults' list are likely to be just Python strings. They could also be ints, floats, or even user-defined objects, if the user created parse actions that did some kind of type conversion. So to visit the results, each matched item or group would have to be wrapped in some kind of "visitable" object, most likely created in a parse action. I have implemented something *like* this, for example in the SimpleBool.py example - here every matching expression returns an instance of a subclass of BoolOperand, which implements its own version of __nonzero__. After parsing, the parseResults are evaluated by calling bool(item) for each item in the results - bool(x) calls x.__nonzero__ in Python2.x. So for visiting parsed results, I may have to just add another example to the examples directory, showing how to use the Visitor pattern on the matched tokens. -- Paul |
From: Paul M. <pt...@au...> - 2009-09-27 12:42:32
|
> would be possible to let visitExpr and visitIntervening > return a value (None or a bool) > to indicate if they like to be called again or not for > the remaining tokens? > Helmut, Thanks for your suggestion. I've thought about it a bit, see what you think. I think the default behavior would be that these methods, if defined, would continue to be called as long as there are parsed matches in the input text. Since Python methods that don't explicitly return anything actually return None, then I would interpret a None return as the same as the default case, that is, to keep on matching. To add an overt return value to indicate that no more calls should be made, we could return True to keep on calling, or False to stop calling. Oddly, this would have pyparsing treating returned values of True and None equally, which is a bit of a code smell. If I invert the meaning of the returned flag, False meaning keeping calling and True meaning stop calling, then my flag asserts a negative, which is a different kind of smell to me. Instead of returning a flag, the methods could raise an exception, and StopIteration seems like a logical choice. My first thought is to have either one of these visit methods raise StopIteration, and have that stop the parsing process altogether - this seems to me to be in line with the spirit of the original Visitor pattern, in which all visit() methods were roughly the same, differing only in argument signatures. Or I could track visitExpr and visitIntervening separately, and if one raises StopIteration, I could have pyparsing continue to call the other. But this feels weird, my instinct would be to have StopIteration just stop parsing altogether, whichever visit method raised it. Here is an alternative: instead of adding this flag or exception as part of the interaction between pyparsing and your Visitor code, you could have your Visitor class handle the alternative logic. Here is a class that, after having had a method called once, changes the instance's method to a do-nothing method. class CallOnceVisitor(object): def method(self): print "method" # redefine method, since we just wanted the first self.method = self.do_nothing def do_nothing(self): pass co = CallOnceVisitor() co.method() co.method() You could do the same in visitExpr, by changing self.visitExpr to self.donothing. Pyparsing will still make the function calls, but they will just return immediately. But now you have more control over what happens when, and pyparsing's logic stays fairly simple-minded. Overall, I think adding support for StopIteration (or similar exception) is good, in which any visitXXX method can raise it and stop the parsing process. If finer control is needed, then I would put the burden back on the visitXXX method implementations to keep track, perhaps using techniques like in CallOnceVisitor. -- Paul |
From: Helmut J. <jar...@ig...> - 2009-09-24 08:58:08
|
On 24 Sep, Paul McGuire wrote: > I am considering adding an interface to pyparsing along the lines of the > Visitor pattern. My intent is to make it easier to work with the scanString > method. Currently, when using scanString, one gets the tokens, start, and > end for each matching text in the input string. This forces the caller to > keep track of some low-level parsing state/locations if they need to do some > processing of the intervening text, or some other stateful work. By writing > a Visitor, this can be tracked in a more object-friendly way. > > Here's how the Visitor would work. The concept is that, after creating a > pyparsing grammar, one could define a class that implements a method > visitExpr, which receives a ParseResults containing the matching tokens, and > optionally a method visitIntervening, which receives a string containing the > portion of the input string between matches - call this class ParseVisitor. > The pyparsing grammar expression - let's call it expr - then accepts this > visitor, and gives us a callable object. This new object can now be called > with an input string, and the visitExpr and visitIntervening methods will > get called as the input string is parsed. Here is a sample: > > from pyparsing import * > > expr = Word(alphas) > > tests = """\ > ABC 123 DEF 456 > ABC 123 DEF 456 XYZ > ABC 123 DEF 456 XYZ > 0 ABC 123 DEF 456 XYZ > """.splitlines() > > > class ParseVisitor(object): > def visitExpr(self, tokens): > print ">%s<" % tokens.asList(), > def visitIntervening(self, strng): > print "^%s^" % strng, > > > visitor = ParseVisitor() > processor = expr.accept(visitor) > for t in tests: > print t > processor(t) > print > print > > Prints out: > > ABC 123 DEF 456 > ^^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456^ > > ABC 123 DEF 456 XYZ > ^^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< > > ABC 123 DEF 456 XYZ > ^ ^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< > > 0 ABC 123 DEF 456 XYZ > ^0 ^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< > > > (In the pure Visitor pattern, ParseVisitor would implement two different > methods, both named visit, with one taking a ParseResults and the other > taking a string. But since Python doesn't do function overloading, I've had > to give these different names. But now, how nice and explicit the resulting > class is!) > > What do people think of this idea? > Yes, that looks great! One question though, would be possible to let visitExpr and visitIntervening return a value (None or a bool) to indicate if they like to be called again or not for the remaining tokens? Helmut. -- Helmut Jarausch Lehrstuhl fuer Numerische Mathematik RWTH - Aachen University D 52056 Aachen, Germany |
From: Paul M. <pt...@au...> - 2009-09-24 08:15:28
|
I am considering adding an interface to pyparsing along the lines of the Visitor pattern. My intent is to make it easier to work with the scanString method. Currently, when using scanString, one gets the tokens, start, and end for each matching text in the input string. This forces the caller to keep track of some low-level parsing state/locations if they need to do some processing of the intervening text, or some other stateful work. By writing a Visitor, this can be tracked in a more object-friendly way. Here's how the Visitor would work. The concept is that, after creating a pyparsing grammar, one could define a class that implements a method visitExpr, which receives a ParseResults containing the matching tokens, and optionally a method visitIntervening, which receives a string containing the portion of the input string between matches - call this class ParseVisitor. The pyparsing grammar expression - let's call it expr - then accepts this visitor, and gives us a callable object. This new object can now be called with an input string, and the visitExpr and visitIntervening methods will get called as the input string is parsed. Here is a sample: from pyparsing import * expr = Word(alphas) tests = """\ ABC 123 DEF 456 ABC 123 DEF 456 XYZ ABC 123 DEF 456 XYZ 0 ABC 123 DEF 456 XYZ """.splitlines() class ParseVisitor(object): def visitExpr(self, tokens): print ">%s<" % tokens.asList(), def visitIntervening(self, strng): print "^%s^" % strng, visitor = ParseVisitor() processor = expr.accept(visitor) for t in tests: print t processor(t) print print Prints out: ABC 123 DEF 456 ^^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456^ ABC 123 DEF 456 XYZ ^^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< ABC 123 DEF 456 XYZ ^ ^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< 0 ABC 123 DEF 456 XYZ ^0 ^ >['ABC']< ^ 123 ^ >['DEF']< ^ 456 ^ >['XYZ']< (In the pure Visitor pattern, ParseVisitor would implement two different methods, both named visit, with one taking a ParseResults and the other taking a string. But since Python doesn't do function overloading, I've had to give these different names. But now, how nice and explicit the resulting class is!) What do people think of this idea? -- Paul |
From: Paul M. <pt...@au...> - 2009-09-24 07:42:09
|
Francis - Thanks for your message, you have uncovered a subtle bug in originalTextFor. See the forum post/thread for details. Cheers! -- Paul > -----Original Message----- > From: Francis Vidal [mailto:fra...@gm...] > Sent: Thursday, September 24, 2009 1:11 AM > To: pyp...@li... > Subject: [Pyparsing] This is a follow-up of my post in the forum > <snip> |
From: Francis V. <fra...@gm...> - 2009-09-24 06:10:55
|
I have the following data set I want to process: data = """ . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 06/15/1925 SAMSON, JAMES HUBILLA 1111-0001A-F1567GHA2 1 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 05/14/1925 CRUZ, JOSE ENDAYA2 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 11/26/1925 PEREZ, JAMES ENDAYA 1111-0001A-K2661CEA1 3 . BRGY. 1 RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 03/31/1925 CRUZ, RAMON CANTRE4 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 01/20/1925 RAMONCITO, CARLOS ENDAYA 1111-0001A-A2055LEA1 5 . #234, BARANGAY I (POB.), RABAGO, REVENA M 01/20/1925 CRUZ, SUSAN CANTRE 1111-0001A-A2079NCA1-6 6 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 06/03/1925 CRUZ, RAUL ENDAYA 1111-0001A-F0330OEA2 7 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 02/17/1925 JOSE, TEOFISTO ENDAYA8 . BARANGAY I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 11/08/1925 RAMONCITO, JOSEPH MASONGSONG 1111-0001A-K0869RMA1 9 . BARANGAY I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 12/10/1925 ARAGON, VINCENT GERANCE 1111-0001A-L1071VGA2 10 . BAGONG SILANG BRGY. I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 10/20/1925 PASTORA, JOBI SEPTIMO 1111-0001A-J2062DSA1 11 . BARANGAY I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 09/09/1925 CRUZ, CARLOS JR. AVENDAÑO 1111-0001A-I0981AAA1 12 . BARANGAY I RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 10/16/1925 CRUZ, NANCY CASTOR 1111-0001A-J1680NCA2 13 . F. FULE ST. RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA F 01/03/1925 CRUZ, CORY ABARCAR 1111-0001A-A0364CAA2 14 . 118 F. FULE ST., BARANGAY I (POB.), RABAGO, REVENA F 11/07/1925 JOSE, FREDA DIONGLAY 1111-0001A-K0723GDA2 15 . F. FULE ST. RABAGO REVENA, BARANGAY I (POB.), RABAGO, REVENA M 03/26/1925 ZAMORA, DANDING DIONGLAY 1111-0001A-C2663MDA1 16 """ NL = LineEnd().suppress() gender = oneOf("M F") integer = Word(nums) date = Combine(integer + '/' + integer + '/' + integer) # define the simple line definitions gender_line = gender("sex") + NL dob_line = date("DOB") + NL name_line = Word(alphas8bit + "," + alphas8bit) + NL id_line = Combine(Word(alphanums) + "-" + Word(alphanums) + "-" + Word(alphanums))("ID") + NL recnum_line = integer("recnum") + NL # define forms of address lines first_addr_line = LineStart() + Suppress('.') + empty + restOfLine + NL # a subsequent address line is any line that is not a gender definition subsq_addr_line = ~(gender_line) + restOfLine + NL # a line with a name and a recnum combined, if there is no ID name_recnum_line = originalTextFor(OneOrMore(Word(alphas+',')))("name") + \ integer("recnum") + NL record = (first_addr_line + ZeroOrMore(subsq_addr_line))("address") + \ gender_line + dob_line + ((name_line + id_line + recnum_line) | name_recnum_line) records = record.searchString(data) But it's not matching the "id_line". What's wrong with the id_line definition? |
From: Eike W. <eik...@gm...> - 2009-09-03 15:09:05
|
On Monday 31 August 2009, John Krukoff wrote: > I do say was, though, as I stumbled across this comment on the > discussion board: > > http://pyparsing.wikispaces.com/message/view/home/13557997 > > Which was an instance of someone else having a performance problem > with pyparsing.indentedBlock. His solution worked for me, and > didn't break any of my test suite. My parser still isn't speedy, > but at least it's not ridiculous anymore (from 13 minutes to parse > that test case down to about a tenth of a second). I patched > indentedBlock as so: That's an interesting find. I put the modification into my patched version of Pyparsing and it didn't break any tests in my project too. The modification also solves one problem where Pyparsing can't parse a complex but syntactically correct file. Without the modification the parser crashes with "RuntimeError: maximum recursion depth exceeded". So this modification seems to be quite usefull. John, you should put a patch into Pyparsing's tracker, so that it doesn't get lost. http://sourceforge.net/tracker/?atid=617313&group_id=97203&func=browse Kind regards, Eike. |
From: Diez B. R. <de...@we...> - 2009-09-02 12:37:27
|
Hi Paul, thank you very much for your detailed and helpful answer - I updated my parser, and gained some 20-30% speedup. But I wish it was faster :) Here is the new version: http://pyparsing.pastebin.com/m2736e089 There are some issues I still have, see below: > 1. I see you solved your NUMBER issues, but I think you still have some > misconceptions about repetition, especially about Word. Here are your > NUMBER elements: > > numlit = Word(srange("[0-9]")) > DOT = Literal(".") > NUMBER = Combine(OneOrMore(numlit)) ^ Combine(ZeroOrMore(numlit) + DOT > + OneOrMore(numlit)) > > Here is the reference from the BNF: > > num [0-9]+|[0-9]*"."[0-9]+ > > Word is there to define "word groups" or contiguous characters in a > particular set. A better translation of num to pyparsing would be: > > numlit = Word(srange("[0-9]")) > DOT = Literal(".") > NUMBER = numlit | Combine(Optional(numlit) + "." + numlit) > > Word already takes care of the character repetition, there is no need for > the OneOrMore or ZeroOrMore. > > But in practice, I've found that numeric literal parsing is usually a > frequent step in overall parsing, and that a Regex term is worth the > trouble for measurably better parser performance: > > NUMBER = Regex(r"[0-9]*\.[0-9]+|[0-9]+") Done. > > > 2. Why this definition of FUNCTION and function? (Nevermind, I looked at > your BNF reference and found that this is mapping directly from the YACC > definitions.) > > FUNCTION = Combine(IDENT+ LPAREN) > ... > function = FUNCTION + ZeroOrMore(Optional(IDENT + EQUAL) + expr) + > RPAREN > > This makes it hard to see the matching of parens. I would suggest: > > function = IDENT + LPAREN + ZeroOrMore(Optional(IDENT + EQUAL) + expr) > + RPAREN > > Lastly, to give structure to your results: > > funcarg = Optional(IDENT + EQUAL) + expr > function = IDENT + LPAREN + Group(Optional(delimitedList(funcarg))) + > RPAREN Done that, too. > Now that the arguments are grouped, the parens are unnecessary in the > parsed output, you can suppress them. > > > > 3. expr follows a very common pattern, that of the delimited list. > > expr << (term + ZeroOrMore( Optional(operator) + term)) > > Here you could instead use: > > expr << delimitedList(term, delim=Optional(operator)) Here, I have the problem that the delimiters are operators - and these are swallowed. I could give the combine-parameter, but then I'm getting the whole string, having to parse that again - which makes no sense (especially as here we can have nested expressions, so "simple" approaches don't work. Any other suggestion, or shall I just keep my original rule? > 4. You may have gone a bit overboard in using '^' vs. '|'. For instance: > > LENGTH = Combine(NUMBER + (Literal("px") ^ Literal("cm") ^ > Literal("mm") ^ > Literal("in") ^ Literal("pt") ^ > Literal("pc"))) > > When you use '^', all matches are evaluated, even if there is a match early > in the list. Now in this case, if you parse the 'px' in '100px', there is > no point in checking for a match with 'cm', 'mm', 'in', etc. In this case > a MatchFirst is perfectly okay. Plus you can order the units in some > expected frequency of occurrence. For these cases, your suggestion to use | worked, but not for this one: term = Optional(unary_operator) + ((PERCENTAGE | LENGTH | EMS | EXS | ANGLE | TIME | FREQ | NUMBER)\ | STRING | URI | hexcolor | (function ^ IDENT)) I think the problem here is the IDENT vs. function - both start with an IDENT. But I'd say problem should be solvable with a lookahead of 1 (in normal LL(k)-terms), so maybe you have an idea here, too. Thanks again for your support! Diez |
From: John K. <jkr...@lt...> - 2009-08-31 21:09:08
|
On Sat, 2009-08-29 at 00:00 -0500, Paul McGuire wrote: > > -----Original Message----- > > From: John Krukoff [mailto:jkr...@lt...] > > Sent: Friday, August 28, 2009 6:09 PM > > To: pyp...@li... > > Subject: [Pyparsing] Painfully slow parsing. > > > > Hello, > > > > I have a serious speed problem with a parser written using pyparsing, > > where it's taking ~13 minutes to parse a 30 line file. I'm totally lost > > on what might be causing it, as small variations seem to be causing > > large differences in parsing time. I was hoping I could get some tips on > > general optimization strategies to follow. For instance, I'm suspicious > > that I should be trying harder to use the '-' operator, and wonder if > > that would help... > > > > John - > > I've not seen people use '-' as a way to speed up parsing, but I imagine it > could help. '?load', '?attribute', and '?element' look like good places > where '-' would be a fit (right after the keyword literal). > > But I am struggling as to where to even begin. You have posted 500+ lines > of parser code, without much guidance as to what BNF you are working from, > or what you are trying to get from the parser. But you didn't post the 30 > line test file, so I have nothing to run your parser with. Hello, Thank you for giving my poorly asked question more time than it deserves. As an excuse, I truly had no idea where in that block of parser code I was having a speed problem, and was at a complete loss as to how to narrow it down. There's really no formal description of the grammar, the best I've got for documentation is here: http://code.google.com/p/compactxml/source/browse/compact.rst Which is really more of a set of examples than anything. The grammar developed fairly organically to scratch a very specific itch. The specific test case that was slow is here, as part of the (nose based) module test suite: http://code.google.com/p/compactxml/source/browse/compactxml/tests/speed_test.py I do say was, though, as I stumbled across this comment on the discussion board: http://pyparsing.wikispaces.com/message/view/home/13557997 Which was an instance of someone else having a performance problem with pyparsing.indentedBlock. His solution worked for me, and didn't break any of my test suite. My parser still isn't speedy, but at least it's not ridiculous anymore (from 13 minutes to parse that test case down to about a tenth of a second). I patched indentedBlock as so: http://code.google.com/p/compactxml/source/browse/compactxml/pyparsingaddons.py I really appreciate the tips you've given me, and they've given me a good starting place to try and shave down the parsing time further. > > You already mention that packratting isn't an option, how about psyco? The memory footprint hasn't worked out well for me in the past, but I went and took a closer look at it. Unfortunately it looks like even if I get over that, it's incompatible with the 64-bit production environment I have to run on. > > Why do you write this: > restartIndentation = pyparsing.Literal( '<' ).setParseAction( lambda s, l, > t: push_indent( ) ).suppress( ) > resumeIndentation = pyparsing.Literal( '>' ).setParseAction( lambda s, l, t: > pop_indent( ) ).suppress( ) > > Instead of: > > restartIndentation = pyparsing.Literal( '<' ).setParseAction( push_indent > ).suppress( ) > resumeIndentation = pyparsing.Literal( '>' ).setParseAction( pop_indent > ).suppress( ) Wow, that's a handy shortcut to know. I had no idea from the pyparsing documentation that parse actions could take a variable number of arguments like that. Now that you've pointed it out, I see the rules stated in the setParseAction docstring, which I'll use to shorten some of the grammar up a bit. > > > Here's an idea: follow your definition of endStatement with this: > > endStatement.setName("endStatement").setDebug() > > Re-run your test, and see how much retracing of your steps is going on. You > might find that you parse to the end, and then spend most of the time > figuring out that you're actually AT the end. Perfect! I didn't understand what these methods were for, and this really gives me the tools to decipher what the parser is doing. > > > This code also looks like a likely performance problem: > > def create_block( simple, compound ): > block = pyparsing.Forward( ) > simpleStatement = simple + endStatement > compoundStatement = compound + endStatement + > pyparsing.Optional( block ) > statement = compoundStatement | simpleStatement > block << addons.indentedBlock( statement, aIndentations ) > block.setParseAction( lambda s, l, t: t[ 0 ] ) > return compoundStatement > > You can try adding some more setName/setDebug calls, to get more insight to > how pyparsing is working its way through your grammar. > > As for getting response to your questions, the mailing list and wiki > Discussion tab are about the same, although I think other people besides me > are more likely to chime in on the list. This actually bodes well for > getting a faster response, as I just started a new job, and am pretty busy > trying to get off on a good start. You could also try posting on > stackoverflow.com - you might get a response from Alex Martelli himself! > > Good luck! > -- Paul > > -- John Krukoff <jkr...@lt...> Land Title Guarantee Company |
From: Paul M. <pt...@au...> - 2009-08-29 05:00:59
|
> -----Original Message----- > From: John Krukoff [mailto:jkr...@lt...] > Sent: Friday, August 28, 2009 6:09 PM > To: pyp...@li... > Subject: [Pyparsing] Painfully slow parsing. > > Hello, > > I have a serious speed problem with a parser written using pyparsing, > where it's taking ~13 minutes to parse a 30 line file. I'm totally lost > on what might be causing it, as small variations seem to be causing > large differences in parsing time. I was hoping I could get some tips on > general optimization strategies to follow. For instance, I'm suspicious > that I should be trying harder to use the '-' operator, and wonder if > that would help... > John - I've not seen people use '-' as a way to speed up parsing, but I imagine it could help. '?load', '?attribute', and '?element' look like good places where '-' would be a fit (right after the keyword literal). But I am struggling as to where to even begin. You have posted 500+ lines of parser code, without much guidance as to what BNF you are working from, or what you are trying to get from the parser. But you didn't post the 30 line test file, so I have nothing to run your parser with. You already mention that packratting isn't an option, how about psyco? Why do you write this: restartIndentation = pyparsing.Literal( '<' ).setParseAction( lambda s, l, t: push_indent( ) ).suppress( ) resumeIndentation = pyparsing.Literal( '>' ).setParseAction( lambda s, l, t: pop_indent( ) ).suppress( ) Instead of: restartIndentation = pyparsing.Literal( '<' ).setParseAction( push_indent ).suppress( ) resumeIndentation = pyparsing.Literal( '>' ).setParseAction( pop_indent ).suppress( ) Here's an idea: follow your definition of endStatement with this: endStatement.setName("endStatement").setDebug() Re-run your test, and see how much retracing of your steps is going on. You might find that you parse to the end, and then spend most of the time figuring out that you're actually AT the end. This code also looks like a likely performance problem: def create_block( simple, compound ): block = pyparsing.Forward( ) simpleStatement = simple + endStatement compoundStatement = compound + endStatement + pyparsing.Optional( block ) statement = compoundStatement | simpleStatement block << addons.indentedBlock( statement, aIndentations ) block.setParseAction( lambda s, l, t: t[ 0 ] ) return compoundStatement You can try adding some more setName/setDebug calls, to get more insight to how pyparsing is working its way through your grammar. As for getting response to your questions, the mailing list and wiki Discussion tab are about the same, although I think other people besides me are more likely to chime in on the list. This actually bodes well for getting a faster response, as I just started a new job, and am pretty busy trying to get off on a good start. You could also try posting on stackoverflow.com - you might get a response from Alex Martelli himself! Good luck! -- Paul |
From: Paul M. <pt...@au...> - 2009-08-29 03:58:00
|
> However, I solved the issue - see the NUMBER-nonterminal. But it might > help if > you guys take a look if that's really the way to go. > > Diez > Diez - 1. I see you solved your NUMBER issues, but I think you still have some misconceptions about repetition, especially about Word. Here are your NUMBER elements: numlit = Word(srange("[0-9]")) DOT = Literal(".") NUMBER = Combine(OneOrMore(numlit)) ^ Combine(ZeroOrMore(numlit) + DOT + OneOrMore(numlit)) Here is the reference from the BNF: num [0-9]+|[0-9]*"."[0-9]+ Word is there to define "word groups" or contiguous characters in a particular set. A better translation of num to pyparsing would be: numlit = Word(srange("[0-9]")) DOT = Literal(".") NUMBER = numlit | Combine(Optional(numlit) + "." + numlit) Word already takes care of the character repetition, there is no need for the OneOrMore or ZeroOrMore. But in practice, I've found that numeric literal parsing is usually a frequent step in overall parsing, and that a Regex term is worth the trouble for measurably better parser performance: NUMBER = Regex(r"[0-9]*\.[0-9]+|[0-9]+") 2. Why this definition of FUNCTION and function? (Nevermind, I looked at your BNF reference and found that this is mapping directly from the YACC definitions.) FUNCTION = Combine(IDENT+ LPAREN) ... function = FUNCTION + ZeroOrMore(Optional(IDENT + EQUAL) + expr) + RPAREN This makes it hard to see the matching of parens. I would suggest: function = IDENT + LPAREN + ZeroOrMore(Optional(IDENT + EQUAL) + expr) + RPAREN Lastly, to give structure to your results: funcarg = Optional(IDENT + EQUAL) + expr function = IDENT + LPAREN + Group(Optional(delimitedList(funcarg))) + RPAREN Now that the arguments are grouped, the parens are unnecessary in the parsed output, you can suppress them. 3. expr follows a very common pattern, that of the delimited list. expr << (term + ZeroOrMore( Optional(operator) + term)) Here you could instead use: expr << delimitedList(term, delim=Optional(operator)) 4. You may have gone a bit overboard in using '^' vs. '|'. For instance: LENGTH = Combine(NUMBER + (Literal("px") ^ Literal("cm") ^ Literal("mm") ^ Literal("in") ^ Literal("pt") ^ Literal("pc"))) When you use '^', all matches are evaluated, even if there is a match early in the list. Now in this case, if you parse the 'px' in '100px', there is no point in checking for a match with 'cm', 'mm', 'in', etc. In this case a MatchFirst is perfectly okay. Plus you can order the units in some expected frequency of occurrence. LENGTH = Combine(NUMBER + (Literal("px") | Literal("cm") | Literal("mm") | Literal("in") | Literal("pt") | Literal("pc"))) Now this could get you in trouble, if one of these terms was actually a leading subset of another, like "pts" and "pt". You would have to take care to test for the longer choice first. Pyparsing's helper method oneOf handles this (and internally generates a Regex for performance): LENGTH = Combine(NUMBER + oneOf("px cm mm in pt pc")) Thanks for giving pyparsing a shot! -- Paul |
From: Gre7g L. <haf...@ya...> - 2009-08-29 00:44:15
|
Try: import pyparsing as PP PP.ParserElement.enablePackrat() It's on the website somewhere, but not nearly prominent enough IMHO. Gre7g ________________________________ From: John Krukoff <jkr...@lt...> To: pyp...@li... Sent: Friday, August 28, 2009 5:08:39 PM Subject: [Pyparsing] Painfully slow parsing. Hello, I have a serious speed problem with a parser written using pyparsing, where it's taking ~13 minutes to parse a 30 line file. I'm totally lost on what might be causing it, as small variations seem to be causing large differences in parsing time. I was hoping I could get some tips on general optimization strategies to follow. For instance, I'm suspicious that I should be trying harder to use the '-' operator, and wonder if that would help... I've posted my project on google code for easy access, the relevant bit that defines the parsing grammar if I can interest someone in taking a look, is at: http://code.google.com/p/compactxml/source/browse/compactxml/expand.py I'm making heavy use of significant whitespace and pyparsing.indentedBlock, so it does feel like I'm fighting against pyparsing a bit. Unfortunately, it looks like indentedBlock is incompatible with packrat parsing, so the most obvious performance improving tip looks to be unusable. I'm also unsure if the mailing list is the best place to ask for help, as it looks like there's more traffic on the pyparsing home page discussion tab? -- John Krukoff <jkr...@lt...> Land Title Guarantee Company ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: John K. <jkr...@lt...> - 2009-08-28 23:43:57
|
Hello, I have a serious speed problem with a parser written using pyparsing, where it's taking ~13 minutes to parse a 30 line file. I'm totally lost on what might be causing it, as small variations seem to be causing large differences in parsing time. I was hoping I could get some tips on general optimization strategies to follow. For instance, I'm suspicious that I should be trying harder to use the '-' operator, and wonder if that would help... I've posted my project on google code for easy access, the relevant bit that defines the parsing grammar if I can interest someone in taking a look, is at: http://code.google.com/p/compactxml/source/browse/compactxml/expand.py I'm making heavy use of significant whitespace and pyparsing.indentedBlock, so it does feel like I'm fighting against pyparsing a bit. Unfortunately, it looks like indentedBlock is incompatible with packrat parsing, so the most obvious performance improving tip looks to be unusable. I'm also unsure if the mailing list is the best place to ask for help, as it looks like there's more traffic on the pyparsing home page discussion tab? -- John Krukoff <jkr...@lt...> Land Title Guarantee Company |
From: Diez B. R. <de...@we...> - 2009-08-26 23:01:31
|
Alexey Borzenkov schrieb: > On Wed, Aug 26, 2009 at 8:28 PM, Diez B. Roggisch<de...@we...> wrote: >> I need to have the *full* location of a mathed rule - not only the start. >> >> Because I skip whitespace, summing the length of tokens + the start to >> determine the end doesn't work. >> >> Any suggestions on how to do that? > > Look at how originalTextFor is implemented. Thank you very much, it works using originalTextFor - I just wonder why this behavior isn't default. The passed loc is otherwise rather useless. Diez |
From: Alexey B. <sn...@gm...> - 2009-08-26 18:20:27
|
On Wed, Aug 26, 2009 at 8:28 PM, Diez B. Roggisch<de...@we...> wrote: > I need to have the *full* location of a mathed rule - not only the start. > > Because I skip whitespace, summing the length of tokens + the start to > determine the end doesn't work. > > Any suggestions on how to do that? Look at how originalTextFor is implemented. |
From: Diez B. R. <de...@we...> - 2009-08-26 16:28:35
|
Hi, I need to have the *full* location of a mathed rule - not only the start. Because I skip whitespace, summing the length of tokens + the start to determine the end doesn't work. Any suggestions on how to do that? Diez |
From: Diez B. R. <de...@we...> - 2009-08-26 15:11:23
|
On Wednesday 26 August 2009 16:58:20 Diez B. Roggisch wrote: > Hi, > > I've got this in my my grammar: > > URL = OneOrMore(Word(srange("[!#$%&*-~]"))) > URI = Literal("url(") + (STRING | URL) + Literal(")") > URI.addParseAction(URINode.parse_action) > > The URI-parse-action is never called. I actually can attach a parse-action > to the URL - that works, but is to early for me. "Trick 17 mit selbstverarschung" as we germans say - it was some surrounding code that caused the action to be ignored. Sorry for the noise. Diez |
From: Diez B. R. <de...@we...> - 2009-08-26 14:59:57
|
Hi, I've got this in my my grammar: URL = OneOrMore(Word(srange("[!#$%&*-~]"))) URI = Literal("url(") + (STRING | URL) + Literal(")") URI.addParseAction(URINode.parse_action) The URI-parse-action is never called. I actually can attach a parse-action to the URL - that works, but is to early for me. Any suggestions? Diez |
From: Paul M. <pt...@au...> - 2009-08-26 10:26:29
|
Alexey - I'm so sorry not to have responded earlier, and apologize in advance that I'm still not sending a substantive reply. I just started a new job, and my pyparsing time is very limited. I'll try to look at your suggestion this weekend. -- Paul (Unfortunately, some of my unit test cases use proprietary data formats that I can't give out publicly.) > -----Original Message----- > From: Alexey Borzenkov [mailto:sn...@gm...] > Sent: Wednesday, August 26, 2009 3:58 AM > To: Pyp...@li... > Subject: [Pyparsing] pyparsing, AST and named results > > Hi all, > > I've sent following letters to Paul 20 days ago, but unfortunately > didn't receive any reply. :( So, I'm forwarding it to this list, maybe > someone else would be interested in the following patch... > > ---------- Forwarded message ---------- > From: Alexey Borzenkov <sn...@gm...> > Date: Tue, Aug 4, 2009 at 8:43 PM > Subject: pyparsing, AST and named results > To: Paul McGuire <pt...@us...> > > Hi Paul, > > I've been using pyparsing to generate a simple AST, and among others I > had AST.Body class for sequence of statements. The problem was that as > soon as I implemented __len__, __iter__ and __getitem__ on AST.Body I > started having weird problems (all statements but the first one > disappearing from compiled code), which I traced to this patch: > > [...older patch snipped...] > > Basically, because I implemented __getitem__, and because I was using > named parameters, only the first element (the first statement) was > getting assigned under that name. This is a very quick (and possibly > incomplete fix), but it worked in my case. > > [...other irrelevant info snipped...] > > Thanks, > Alexey. > > ---------- Forwarded message ---------- > From: Alexey Borzenkov <sn...@gm...> > Date: Wed, Aug 5, 2009 at 2:31 PM > Subject: Re: pyparsing, AST and named results > To: Paul McGuire <pt...@us...> > > > Hi Paul, > > It's me again. I've been looking at ParseResults even more, and wonder > if under "if name:" the intension is not to assign empty results under > a name? Because as I see it, the code doesn't check for empty > ParseResults, and I'm wondering if that's intentional or not. If empty > ParseResults don't have any more special meaning than empty lists, > then perhaps it could be patched this way: > > diff --git a/src/pyparsing.py b/src/pyparsing.py > index 57e938a..8dbdec1 100644 > --- a/src/pyparsing.py > +++ b/src/pyparsing.py > @@ -277,14 +277,15 @@ class ParseResults(object): > # constructor as small and fast as possible > def __init__( self, toklist, name=None, asList=True, modal=True ): > if self.__doinit: > + if isinstance(toklist, list): > + toklist = toklist[:] > + else: > + toklist = [toklist] > self.__doinit = False > self.__name = None > self.__parent = None > self.__accumNames = {} > - if isinstance(toklist, list): > - self.__toklist = toklist[:] > - else: > - self.__toklist = [toklist] > + self.__toklist = toklist > self.__tokdict = dict() > > if name: > @@ -293,9 +294,7 @@ class ParseResults(object): > if isinstance(name,int): > name = _ustr(name) # will always return a str, but > use _ustr for consistency > self.__name = name > - if not toklist in (None,'',[]): > - if isinstance(toklist,basestring): > - toklist = [ toklist ] > + if toklist and toklist[0] != '': > if asList: > if isinstance(toklist,ParseResults): > self[name] = > _ParseResultsWithOffset(toklist.copy(),0) > @@ -303,10 +302,7 @@ class ParseResults(object): > self[name] = > _ParseResultsWithOffset(ParseResults(toklist[0]),0) > self[name].__name = name > else: > - try: > - self[name] = toklist[0] > - except (KeyError,TypeError,IndexError): > - self[name] = toklist > + self[name] = toklist[0] > > def __getitem__( self, i ): > if isinstance( i, (int,slice) ): > > The way I see it, it might even have some performance improvement, > because toklist now is always ParseResults or a list (just like in > __toklist), and I checked that toklist should never be None, this > leaves only one comparison (with '') instead of three, and no > unnecessary indexing or try/except. The question is will it break > anything? > > I can't complete all unitTests (examples are missing in svn, and not > everything is in 1.5.2 release), but my changes don't make non-failing > ones fail. > > Also, about empty ParseResults as names, the only case I could come up > with is something like this: > > from pyparsing import * > > s = "()" > g = (Suppress('(') + ZeroOrMore('.') + Suppress(')'))('dots') + > StringEnd() > print repr(g.parseString(s).dots) > > With my changes dots will now disappear, but it seems more consistent > to me, because ZeroOrMore('.')('dots') would not appear under the > name, why a bunch of suppressed tokens should make a difference? > > Thanks, > Alexey. > > -------------------------------------------------------------------------- > ---- > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30- > Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Alexey B. <sn...@gm...> - 2009-08-26 08:58:39
|
Hi all, I've sent following letters to Paul 20 days ago, but unfortunately didn't receive any reply. :( So, I'm forwarding it to this list, maybe someone else would be interested in the following patch... ---------- Forwarded message ---------- From: Alexey Borzenkov <sn...@gm...> Date: Tue, Aug 4, 2009 at 8:43 PM Subject: pyparsing, AST and named results To: Paul McGuire <pt...@us...> Hi Paul, I've been using pyparsing to generate a simple AST, and among others I had AST.Body class for sequence of statements. The problem was that as soon as I implemented __len__, __iter__ and __getitem__ on AST.Body I started having weird problems (all statements but the first one disappearing from compiled code), which I traced to this patch: [...older patch snipped...] Basically, because I implemented __getitem__, and because I was using named parameters, only the first element (the first statement) was getting assigned under that name. This is a very quick (and possibly incomplete fix), but it worked in my case. [...other irrelevant info snipped...] Thanks, Alexey. ---------- Forwarded message ---------- From: Alexey Borzenkov <sn...@gm...> Date: Wed, Aug 5, 2009 at 2:31 PM Subject: Re: pyparsing, AST and named results To: Paul McGuire <pt...@us...> Hi Paul, It's me again. I've been looking at ParseResults even more, and wonder if under "if name:" the intension is not to assign empty results under a name? Because as I see it, the code doesn't check for empty ParseResults, and I'm wondering if that's intentional or not. If empty ParseResults don't have any more special meaning than empty lists, then perhaps it could be patched this way: diff --git a/src/pyparsing.py b/src/pyparsing.py index 57e938a..8dbdec1 100644 --- a/src/pyparsing.py +++ b/src/pyparsing.py @@ -277,14 +277,15 @@ class ParseResults(object): # constructor as small and fast as possible def __init__( self, toklist, name=None, asList=True, modal=True ): if self.__doinit: + if isinstance(toklist, list): + toklist = toklist[:] + else: + toklist = [toklist] self.__doinit = False self.__name = None self.__parent = None self.__accumNames = {} - if isinstance(toklist, list): - self.__toklist = toklist[:] - else: - self.__toklist = [toklist] + self.__toklist = toklist self.__tokdict = dict() if name: @@ -293,9 +294,7 @@ class ParseResults(object): if isinstance(name,int): name = _ustr(name) # will always return a str, but use _ustr for consistency self.__name = name - if not toklist in (None,'',[]): - if isinstance(toklist,basestring): - toklist = [ toklist ] + if toklist and toklist[0] != '': if asList: if isinstance(toklist,ParseResults): self[name] = _ParseResultsWithOffset(toklist.copy(),0) @@ -303,10 +302,7 @@ class ParseResults(object): self[name] = _ParseResultsWithOffset(ParseResults(toklist[0]),0) self[name].__name = name else: - try: - self[name] = toklist[0] - except (KeyError,TypeError,IndexError): - self[name] = toklist + self[name] = toklist[0] def __getitem__( self, i ): if isinstance( i, (int,slice) ): The way I see it, it might even have some performance improvement, because toklist now is always ParseResults or a list (just like in __toklist), and I checked that toklist should never be None, this leaves only one comparison (with '') instead of three, and no unnecessary indexing or try/except. The question is will it break anything? I can't complete all unitTests (examples are missing in svn, and not everything is in 1.5.2 release), but my changes don't make non-failing ones fail. Also, about empty ParseResults as names, the only case I could come up with is something like this: from pyparsing import * s = "()" g = (Suppress('(') + ZeroOrMore('.') + Suppress(')'))('dots') + StringEnd() print repr(g.parseString(s).dots) With my changes dots will now disappear, but it seems more consistent to me, because ZeroOrMore('.')('dots') would not appear under the name, why a bunch of suppressed tokens should make a difference? Thanks, Alexey. |
From: Diez B. R. <de...@we...> - 2009-08-26 08:28:18
|
On Wednesday 26 August 2009 06:17:04 Paul McGuire wrote: > Still no joy in the attachments. Try just including the text in the body > of your e-mail. Or use pyparsing.pastebin.com. > Thanks, here it is: http://pyparsing.pastebin.com/m5df6ab17 However, I solved the issue - see the NUMBER-nonterminal. But it might help if you guys take a look if that's really the way to go. Diez |
From: Paul M. <pt...@au...> - 2009-08-26 04:34:54
|
Still no joy in the attachments. Try just including the text in the body of your e-mail. Or use pyparsing.pastebin.com. -- Paul > -----Original Message----- > From: Diez B. Roggisch [mailto:de...@we...] > Sent: Tuesday, August 25, 2009 12:32 PM > To: Pyp...@li... > Subject: [Pyparsing] expression not greedy enough > > Hi, > > I'm in the process of writing a CSS-parser - and encountered a behavior > that I > don't understand. > > The attached file shows it - I expect the expression > > 0 100px > > to be parsed as two tokens, a NUMBER and an LENGTH. > > But instead, it seems to be parsed as NUMBER, NUMBER, IDENT. > > The second test-case shows that if there is only one sub-expression, > things > work as expected. > > Any suggestions? > > Diez |
From: Diez B. R. <de...@we...> - 2009-08-25 22:16:52
|
This is the test-script- I couldn't read my own attachment as mail here, my apologies if this is an actual duplicate. Diez |
From: Diez B. R. <de...@we...> - 2009-08-25 17:32:21
|
Hi, I'm in the process of writing a CSS-parser - and encountered a behavior that I don't understand. The attached file shows it - I expect the expression 0 100px to be parsed as two tokens, a NUMBER and an LENGTH. But instead, it seems to be parsed as NUMBER, NUMBER, IDENT. The second test-case shows that if there is only one sub-expression, things work as expected. Any suggestions? Diez |
From: Celvin <rea...@gm...> - 2009-08-06 01:15:31
|
Hi, I recently started porting a parser for a custom file format from the Spirit framework over to pyparsing when I noticed some (at least for me) odd behavior when it comes to using Combine in expressions. I know Combine turns off whitespace skipping, probably altering some internal parser state, but it seems like skipping isn't enabled again, at least not when I would expect it. Consider the following expressions: exp = oneOf(["e", "E"]) + ZeroOrMore(oneOf(["+", "-"])) + Word(nums) frac = (ZeroOrMore(Word(nums)) + Literal(".") + Word(nums)) | (Word(nums) + Literal(".")) real_number = Combine((frac + ZeroOrMore(exp)) | (Word(nums) + exp)) Obviously, real_number is what I use to parse standard floating point values from the file. The file contains data measured by some custom hardware device and includes a header starting with initialization data, each on a separate line, that look like this: <STRING_ID>____________________ .000 .000 .000 .000 ...where <STRING_ID> is an user-defined string used as an identifier, followed by an arbitrary number of underscores and 4 floating point values denoting a spatial position in 3d space and a precision estimate. For testing purposes, I defined the following expression to parse initialization data: init_data = ZeroOrMore(Literal("%") | (Literal("<STRING_ID>") + Combine(ZeroOrMore(Literal("_"))))) + Group(real_number*4) + restOfLine Now, when I write tests using "init.data.parseString(...)" and pass the aforementioned line as parameter, I get a ParserException: Expected "." (at char 42), (line:1, col:43) ...stating that parsing failed right after the first whitespace following the first floating point value, expecting another real_number. If I change real_number to look like this: real_number = (frac + ZeroOrMore(exp)) | (Word(nums) + exp) ...thus removing the Combine, parsing is successful. Altering the init_data expression with regards to the Combine call used in that expression has no effect whatsoever. If somebody could explain this behavior, I'd be rather grateful. Regards, Celvin |