Re: [Pyparsing] Newby question about ignore

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Wednesday 12 December 2007 15:55, Paul McGuire wrote:
> -----Original Message-----
> From: pyp...@li...
> [mailto:pyp...@li...] On Behalf Of Kees
> Bakker
> Sent: Wednesday, December 12, 2007 7:46 AM
> To: pyp...@li...
> Subject: [Pyparsing] Newby question about ignore
> 
> In the example goal1.ignore I want to ignore lines with just equal-sign. The
> result is:
> Expected W:(abcd...) (at char 0), (line:1, col:1)
> 
> Another attempt is goal2.ignore to ignore lines that start with '=' and then
> ignore the rest. The result is:
> Expected W:(abcd...) (at char 33), (line:2, col:1)
> 
> ----------
> Kees,
> 
> I'll answer the second example first, since this is pretty straightforward.
> LineStart is very particular about where the parser happens to be when it
> tries to match - it *must* be at the beginning of a line.  In your example,
> you defined your comment expression as:
> 
> lineStart + Literal('==') + restOfLine
> 
> In your input text, the first line of '=' signs matches the opening
> lineStart, the first two '==' signs of the line, and the rest of the line.
> But at this point, the parser is left at the end of line 1.  So now it can't
> match another comment, and it wont match a Word(alphanums), so it raises an
> exception.  The simplest fix is to expand the comment definition to consume
> the line end as well, using:
> 
> lineStart + Literal('==') + restOfLine + lineEnd
> 
> If you change the ignore expression for goal2 to read like this, then goal2
> will successfully ignore the 3 comment lines, and read in the data on line
> 4.

Hmmm. OK I understand your explanation. However, I didn't expect the parser
to not skip whitespace at that point. (You explain more about it further down,
thanks.)

> 
> The first problem is a little stickier.  In this case, you are trying match
> *only* lines made up only of '=' signs.  Doing so with OneOrMore('=')
> actually does a little more than you wanted.  Remember that pyparsing's
> default behavior is to skip over whitespace.  OneOrMore('=') matches each
> '=' separately, and does whitespace skipping between each one.  So what you
> have written will match not only
> ==================
> but also
> = = = = = = = = = = 
> AND, since line ends are treated like whitespace, your comment definition's
> OneOrMore('=') even reads the first two '=' signs on the next line.  Now the
> parser is located at the '.' on line 2, which is nowhere near a lineEnd, so
> the parser concludes that this is not a comment, goes back to line 1, column
> 1, and tries to match a Word(alphanums), which then fails, and raises the
> exception that you see.

Ah, that's good to know. I was too much focused to the reported position.

> 
> The quick fix for problem 1 is to replace OneOrMore('=') with Word('=').
> Word does repetition of a character without skipping whitespace.

Ah, that's also good to know.

> But this 
> only gets you past line 1.  At this point, you are at the beginning of line
> 2, which still does not start with a Word(alphanums).  But now you should
> get a more reasonable exception, Expected W:(abcd...) (at char 33), (line:2,
> col:1).

Yes, that's OK. It was just an example :-)

> 
> So why does Literal('=') read past the line end, but line start doesn't?
> Because a few pyparsing classes *don't* do whitespace skipping!  The ones
> that do not skip whitespace are:
> - the positional classes (LineStart, LineEnd, StringStart, StringEnd)
> - CharsNotIn
> - restOfLine
> 
> My personal choice for defining an '='s-based comment in your example would
> probably be:
> 
> Literal('==') + restOfLine
> 
> Leave out the lineStart and lineEnd, and just ignore '==' to the end of the
> current line.
> 
> But if you want to restrict '==' comments to column 1 only, then you will
> need to use the comment expression I gave at the beginning of this e-mail.
> 
> Hope this helps - Welcome to pyparsing!

Thanks a lot. (BTW, I bought a copy of the PDF booklet from O'Reilly. That is
quite helpful too.)
--
Kees