Re: [Pyparsing] Newby question about ignore
Brought to you by:
ptmcg
From: Paul M. <pt...@au...> - 2007-12-12 14:55:38
|
-----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Kees Bakker Sent: Wednesday, December 12, 2007 7:46 AM To: pyp...@li... Subject: [Pyparsing] Newby question about ignore In the example goal1.ignore I want to ignore lines with just equal-sign. The result is: Expected W:(abcd...) (at char 0), (line:1, col:1) Another attempt is goal2.ignore to ignore lines that start with '=' and then ignore the rest. The result is: Expected W:(abcd...) (at char 33), (line:2, col:1) ---------- Kees, I'll answer the second example first, since this is pretty straightforward. LineStart is very particular about where the parser happens to be when it tries to match - it *must* be at the beginning of a line. In your example, you defined your comment expression as: lineStart + Literal('==') + restOfLine In your input text, the first line of '=' signs matches the opening lineStart, the first two '==' signs of the line, and the rest of the line. But at this point, the parser is left at the end of line 1. So now it can't match another comment, and it wont match a Word(alphanums), so it raises an exception. The simplest fix is to expand the comment definition to consume the line end as well, using: lineStart + Literal('==') + restOfLine + lineEnd If you change the ignore expression for goal2 to read like this, then goal2 will successfully ignore the 3 comment lines, and read in the data on line 4. The first problem is a little stickier. In this case, you are trying match *only* lines made up only of '=' signs. Doing so with OneOrMore('=') actually does a little more than you wanted. Remember that pyparsing's default behavior is to skip over whitespace. OneOrMore('=') matches each '=' separately, and does whitespace skipping between each one. So what you have written will match not only ================== but also = = = = = = = = = = AND, since line ends are treated like whitespace, your comment definition's OneOrMore('=') even reads the first two '=' signs on the next line. Now the parser is located at the '.' on line 2, which is nowhere near a lineEnd, so the parser concludes that this is not a comment, goes back to line 1, column 1, and tries to match a Word(alphanums), which then fails, and raises the exception that you see. The quick fix for problem 1 is to replace OneOrMore('=') with Word('='). Word does repetition of a character without skipping whitespace. But this only gets you past line 1. At this point, you are at the beginning of line 2, which still does not start with a Word(alphanums). But now you should get a more reasonable exception, Expected W:(abcd...) (at char 33), (line:2, col:1). So why does Literal('=') read past the line end, but line start doesn't? Because a few pyparsing classes *don't* do whitespace skipping! The ones that do not skip whitespace are: - the positional classes (LineStart, LineEnd, StringStart, StringEnd) - CharsNotIn - restOfLine My personal choice for defining an '='s-based comment in your example would probably be: Literal('==') + restOfLine Leave out the lineStart and lineEnd, and just ignore '==' to the end of the current line. But if you want to restrict '==' comments to column 1 only, then you will need to use the comment expression I gave at the beginning of this e-mail. Hope this helps - Welcome to pyparsing! -- Paul |