On Wednesday 12 December 2007 15:55, Paul McGuire wrote:
> -----Original Message-----
> From: pyparsing-users-bounces@...
> [mailto:pyparsing-users-bounces@...] On Behalf Of Kees
> Sent: Wednesday, December 12, 2007 7:46 AM
> To: pyparsing-users@...
> Subject: [Pyparsing] Newby question about ignore
> In the example goal1.ignore I want to ignore lines with just equal-sign. The
> result is:
> Expected W:(abcd...) (at char 0), (line:1, col:1)
> Another attempt is goal2.ignore to ignore lines that start with '=' and then
> ignore the rest. The result is:
> Expected W:(abcd...) (at char 33), (line:2, col:1)
> I'll answer the second example first, since this is pretty straightforward.
> LineStart is very particular about where the parser happens to be when it
> tries to match - it *must* be at the beginning of a line. In your example,
> you defined your comment expression as:
> lineStart + Literal('==') + restOfLine
> In your input text, the first line of '=' signs matches the opening
> lineStart, the first two '==' signs of the line, and the rest of the line.
> But at this point, the parser is left at the end of line 1. So now it can't
> match another comment, and it wont match a Word(alphanums), so it raises an
> exception. The simplest fix is to expand the comment definition to consume
> the line end as well, using:
> lineStart + Literal('==') + restOfLine + lineEnd
> If you change the ignore expression for goal2 to read like this, then goal2
> will successfully ignore the 3 comment lines, and read in the data on line
Hmmm. OK I understand your explanation. However, I didn't expect the parser
to not skip whitespace at that point. (You explain more about it further down,
> The first problem is a little stickier. In this case, you are trying match
> *only* lines made up only of '=' signs. Doing so with OneOrMore('=')
> actually does a little more than you wanted. Remember that pyparsing's
> default behavior is to skip over whitespace. OneOrMore('=') matches each
> '=' separately, and does whitespace skipping between each one. So what you
> have written will match not only
> but also
> = = = = = = = = = =
> AND, since line ends are treated like whitespace, your comment definition's
> OneOrMore('=') even reads the first two '=' signs on the next line. Now the
> parser is located at the '.' on line 2, which is nowhere near a lineEnd, so
> the parser concludes that this is not a comment, goes back to line 1, column
> 1, and tries to match a Word(alphanums), which then fails, and raises the
> exception that you see.
Ah, that's good to know. I was too much focused to the reported position.
> The quick fix for problem 1 is to replace OneOrMore('=') with Word('=').
> Word does repetition of a character without skipping whitespace.
Ah, that's also good to know.
> But this
> only gets you past line 1. At this point, you are at the beginning of line
> 2, which still does not start with a Word(alphanums). But now you should
> get a more reasonable exception, Expected W:(abcd...) (at char 33), (line:2,
Yes, that's OK. It was just an example :-)
> So why does Literal('=') read past the line end, but line start doesn't?
> Because a few pyparsing classes *don't* do whitespace skipping! The ones
> that do not skip whitespace are:
> - the positional classes (LineStart, LineEnd, StringStart, StringEnd)
> - CharsNotIn
> - restOfLine
> My personal choice for defining an '='s-based comment in your example would
> probably be:
> Literal('==') + restOfLine
> Leave out the lineStart and lineEnd, and just ignore '==' to the end of the
> current line.
> But if you want to restrict '==' comments to column 1 only, then you will
> need to use the comment expression I gave at the beginning of this e-mail.
> Hope this helps - Welcome to pyparsing!
Thanks a lot. (BTW, I bought a copy of the PDF booklet from O'Reilly. That is
quite helpful too.)