Re: [Pyparsing] Newby question about ignore

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

-----Original Message-----
From: pyp...@li...
[mailto:pyp...@li...] On Behalf Of Kees
Bakker
Sent: Wednesday, December 12, 2007 7:46 AM
To: pyp...@li...
Subject: [Pyparsing] Newby question about ignore

In the example goal1.ignore I want to ignore lines with just equal-sign. The
result is:
Expected W:(abcd...) (at char 0), (line:1, col:1)

Another attempt is goal2.ignore to ignore lines that start with '=' and then
ignore the rest. The result is:
Expected W:(abcd...) (at char 33), (line:2, col:1)

----------
Kees,

I'll answer the second example first, since this is pretty straightforward.
LineStart is very particular about where the parser happens to be when it
tries to match - it *must* be at the beginning of a line.  In your example,
you defined your comment expression as:

lineStart + Literal('==') + restOfLine

In your input text, the first line of '=' signs matches the opening
lineStart, the first two '==' signs of the line, and the rest of the line.
But at this point, the parser is left at the end of line 1.  So now it can't
match another comment, and it wont match a Word(alphanums), so it raises an
exception.  The simplest fix is to expand the comment definition to consume
the line end as well, using:

lineStart + Literal('==') + restOfLine + lineEnd

If you change the ignore expression for goal2 to read like this, then goal2
will successfully ignore the 3 comment lines, and read in the data on line
4.

The first problem is a little stickier.  In this case, you are trying match
*only* lines made up only of '=' signs.  Doing so with OneOrMore('=')
actually does a little more than you wanted.  Remember that pyparsing's
default behavior is to skip over whitespace.  OneOrMore('=') matches each
'=' separately, and does whitespace skipping between each one.  So what you
have written will match not only
==================
but also
= = = = = = = = = = 
AND, since line ends are treated like whitespace, your comment definition's
OneOrMore('=') even reads the first two '=' signs on the next line.  Now the
parser is located at the '.' on line 2, which is nowhere near a lineEnd, so
the parser concludes that this is not a comment, goes back to line 1, column
1, and tries to match a Word(alphanums), which then fails, and raises the
exception that you see.

The quick fix for problem 1 is to replace OneOrMore('=') with Word('=').
Word does repetition of a character without skipping whitespace.  But this
only gets you past line 1.  At this point, you are at the beginning of line
2, which still does not start with a Word(alphanums).  But now you should
get a more reasonable exception, Expected W:(abcd...) (at char 33), (line:2,
col:1).

So why does Literal('=') read past the line end, but line start doesn't?
Because a few pyparsing classes *don't* do whitespace skipping!  The ones
that do not skip whitespace are:
- the positional classes (LineStart, LineEnd, StringStart, StringEnd)
- CharsNotIn
- restOfLine

My personal choice for defining an '='s-based comment in your example would
probably be:

Literal('==') + restOfLine

Leave out the lineStart and lineEnd, and just ignore '==' to the end of the
current line.

But if you want to restrict '==' comments to column 1 only, then you will
need to use the comment expression I gave at the beginning of this e-mail.

Hope this helps - Welcome to pyparsing!

-- Paul