Re: [Pyparsing] Newby question about ignore
Brought to you by:
ptmcg
From: Kees B. <kee...@al...> - 2007-12-12 15:25:16
|
On Wednesday 12 December 2007 15:55, Paul McGuire wrote: > -----Original Message----- > From: pyp...@li... > [mailto:pyp...@li...] On Behalf Of Kees > Bakker > Sent: Wednesday, December 12, 2007 7:46 AM > To: pyp...@li... > Subject: [Pyparsing] Newby question about ignore > > In the example goal1.ignore I want to ignore lines with just equal-sign. The > result is: > Expected W:(abcd...) (at char 0), (line:1, col:1) > > Another attempt is goal2.ignore to ignore lines that start with '=' and then > ignore the rest. The result is: > Expected W:(abcd...) (at char 33), (line:2, col:1) > > ---------- > Kees, > > I'll answer the second example first, since this is pretty straightforward. > LineStart is very particular about where the parser happens to be when it > tries to match - it *must* be at the beginning of a line. In your example, > you defined your comment expression as: > > lineStart + Literal('==') + restOfLine > > In your input text, the first line of '=' signs matches the opening > lineStart, the first two '==' signs of the line, and the rest of the line. > But at this point, the parser is left at the end of line 1. So now it can't > match another comment, and it wont match a Word(alphanums), so it raises an > exception. The simplest fix is to expand the comment definition to consume > the line end as well, using: > > lineStart + Literal('==') + restOfLine + lineEnd > > If you change the ignore expression for goal2 to read like this, then goal2 > will successfully ignore the 3 comment lines, and read in the data on line > 4. Hmmm. OK I understand your explanation. However, I didn't expect the parser to not skip whitespace at that point. (You explain more about it further down, thanks.) > > The first problem is a little stickier. In this case, you are trying match > *only* lines made up only of '=' signs. Doing so with OneOrMore('=') > actually does a little more than you wanted. Remember that pyparsing's > default behavior is to skip over whitespace. OneOrMore('=') matches each > '=' separately, and does whitespace skipping between each one. So what you > have written will match not only > ================== > but also > = = = = = = = = = = > AND, since line ends are treated like whitespace, your comment definition's > OneOrMore('=') even reads the first two '=' signs on the next line. Now the > parser is located at the '.' on line 2, which is nowhere near a lineEnd, so > the parser concludes that this is not a comment, goes back to line 1, column > 1, and tries to match a Word(alphanums), which then fails, and raises the > exception that you see. Ah, that's good to know. I was too much focused to the reported position. > > The quick fix for problem 1 is to replace OneOrMore('=') with Word('='). > Word does repetition of a character without skipping whitespace. Ah, that's also good to know. > But this > only gets you past line 1. At this point, you are at the beginning of line > 2, which still does not start with a Word(alphanums). But now you should > get a more reasonable exception, Expected W:(abcd...) (at char 33), (line:2, > col:1). Yes, that's OK. It was just an example :-) > > So why does Literal('=') read past the line end, but line start doesn't? > Because a few pyparsing classes *don't* do whitespace skipping! The ones > that do not skip whitespace are: > - the positional classes (LineStart, LineEnd, StringStart, StringEnd) > - CharsNotIn > - restOfLine > > My personal choice for defining an '='s-based comment in your example would > probably be: > > Literal('==') + restOfLine > > Leave out the lineStart and lineEnd, and just ignore '==' to the end of the > current line. > > But if you want to restrict '==' comments to column 1 only, then you will > need to use the comment expression I gave at the beginning of this e-mail. > > Hope this helps - Welcome to pyparsing! Thanks a lot. (BTW, I bought a copy of the PDF booklet from O'Reilly. That is quite helpful too.) -- Kees |