Thread: [Pyparsing] Newby question about ignore
Brought to you by:
ptmcg
From: Kees B. <kee...@al...> - 2007-12-12 13:45:54
|
Hi, Here is a simple example. Maybe I'm doing something silly. There are two attempts to catch and ignore certain lines with '='. In the example goal1.ignore I want to ignore lines with just equal-sign. The result is: Expected W:(abcd...) (at char 0), (line:1, col:1) Another attempt is goal2.ignore to ignore lines that start with '=' and then ignore the rest. The result is: Expected W:(abcd...) (at char 33), (line:2, col:1) Can anyone tell me what's wrong with these two examples? TIA, Kees #! /usr/bin/env python # -*- coding: utf-8 -*- from pyparsing import * goal1 = OneOrMore( Word(alphanums) ) goal1.ignore( OneOrMore( Literal('=') ) + lineEnd ) goal2 = OneOrMore( Word(alphanums) ) goal2.ignore( lineStart + Literal('==') + restOfLine ) dump_txt = '''\ ================================ == .debug_info section index 61 ================================ DWCHECK REMARK bla bla ''' def parse_it(): # Make this goal1 or goal2 result = goal1.parseString(dump_txt) if result: from pprint import pprint pprint( result.asList() ) def main(): parse_it() if __name__ == "__main__": try: main() except SystemExit, e: print e except Exception, e: print e |
From: Paul M. <pt...@au...> - 2007-12-12 14:55:38
|
-----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Kees Bakker Sent: Wednesday, December 12, 2007 7:46 AM To: pyp...@li... Subject: [Pyparsing] Newby question about ignore In the example goal1.ignore I want to ignore lines with just equal-sign. The result is: Expected W:(abcd...) (at char 0), (line:1, col:1) Another attempt is goal2.ignore to ignore lines that start with '=' and then ignore the rest. The result is: Expected W:(abcd...) (at char 33), (line:2, col:1) ---------- Kees, I'll answer the second example first, since this is pretty straightforward. LineStart is very particular about where the parser happens to be when it tries to match - it *must* be at the beginning of a line. In your example, you defined your comment expression as: lineStart + Literal('==') + restOfLine In your input text, the first line of '=' signs matches the opening lineStart, the first two '==' signs of the line, and the rest of the line. But at this point, the parser is left at the end of line 1. So now it can't match another comment, and it wont match a Word(alphanums), so it raises an exception. The simplest fix is to expand the comment definition to consume the line end as well, using: lineStart + Literal('==') + restOfLine + lineEnd If you change the ignore expression for goal2 to read like this, then goal2 will successfully ignore the 3 comment lines, and read in the data on line 4. The first problem is a little stickier. In this case, you are trying match *only* lines made up only of '=' signs. Doing so with OneOrMore('=') actually does a little more than you wanted. Remember that pyparsing's default behavior is to skip over whitespace. OneOrMore('=') matches each '=' separately, and does whitespace skipping between each one. So what you have written will match not only ================== but also = = = = = = = = = = AND, since line ends are treated like whitespace, your comment definition's OneOrMore('=') even reads the first two '=' signs on the next line. Now the parser is located at the '.' on line 2, which is nowhere near a lineEnd, so the parser concludes that this is not a comment, goes back to line 1, column 1, and tries to match a Word(alphanums), which then fails, and raises the exception that you see. The quick fix for problem 1 is to replace OneOrMore('=') with Word('='). Word does repetition of a character without skipping whitespace. But this only gets you past line 1. At this point, you are at the beginning of line 2, which still does not start with a Word(alphanums). But now you should get a more reasonable exception, Expected W:(abcd...) (at char 33), (line:2, col:1). So why does Literal('=') read past the line end, but line start doesn't? Because a few pyparsing classes *don't* do whitespace skipping! The ones that do not skip whitespace are: - the positional classes (LineStart, LineEnd, StringStart, StringEnd) - CharsNotIn - restOfLine My personal choice for defining an '='s-based comment in your example would probably be: Literal('==') + restOfLine Leave out the lineStart and lineEnd, and just ignore '==' to the end of the current line. But if you want to restrict '==' comments to column 1 only, then you will need to use the comment expression I gave at the beginning of this e-mail. Hope this helps - Welcome to pyparsing! -- Paul |
From: Kees B. <kee...@al...> - 2007-12-12 15:25:16
|
On Wednesday 12 December 2007 15:55, Paul McGuire wrote: > -----Original Message----- > From: pyp...@li... > [mailto:pyp...@li...] On Behalf Of Kees > Bakker > Sent: Wednesday, December 12, 2007 7:46 AM > To: pyp...@li... > Subject: [Pyparsing] Newby question about ignore > > In the example goal1.ignore I want to ignore lines with just equal-sign. The > result is: > Expected W:(abcd...) (at char 0), (line:1, col:1) > > Another attempt is goal2.ignore to ignore lines that start with '=' and then > ignore the rest. The result is: > Expected W:(abcd...) (at char 33), (line:2, col:1) > > ---------- > Kees, > > I'll answer the second example first, since this is pretty straightforward. > LineStart is very particular about where the parser happens to be when it > tries to match - it *must* be at the beginning of a line. In your example, > you defined your comment expression as: > > lineStart + Literal('==') + restOfLine > > In your input text, the first line of '=' signs matches the opening > lineStart, the first two '==' signs of the line, and the rest of the line. > But at this point, the parser is left at the end of line 1. So now it can't > match another comment, and it wont match a Word(alphanums), so it raises an > exception. The simplest fix is to expand the comment definition to consume > the line end as well, using: > > lineStart + Literal('==') + restOfLine + lineEnd > > If you change the ignore expression for goal2 to read like this, then goal2 > will successfully ignore the 3 comment lines, and read in the data on line > 4. Hmmm. OK I understand your explanation. However, I didn't expect the parser to not skip whitespace at that point. (You explain more about it further down, thanks.) > > The first problem is a little stickier. In this case, you are trying match > *only* lines made up only of '=' signs. Doing so with OneOrMore('=') > actually does a little more than you wanted. Remember that pyparsing's > default behavior is to skip over whitespace. OneOrMore('=') matches each > '=' separately, and does whitespace skipping between each one. So what you > have written will match not only > ================== > but also > = = = = = = = = = = > AND, since line ends are treated like whitespace, your comment definition's > OneOrMore('=') even reads the first two '=' signs on the next line. Now the > parser is located at the '.' on line 2, which is nowhere near a lineEnd, so > the parser concludes that this is not a comment, goes back to line 1, column > 1, and tries to match a Word(alphanums), which then fails, and raises the > exception that you see. Ah, that's good to know. I was too much focused to the reported position. > > The quick fix for problem 1 is to replace OneOrMore('=') with Word('='). > Word does repetition of a character without skipping whitespace. Ah, that's also good to know. > But this > only gets you past line 1. At this point, you are at the beginning of line > 2, which still does not start with a Word(alphanums). But now you should > get a more reasonable exception, Expected W:(abcd...) (at char 33), (line:2, > col:1). Yes, that's OK. It was just an example :-) > > So why does Literal('=') read past the line end, but line start doesn't? > Because a few pyparsing classes *don't* do whitespace skipping! The ones > that do not skip whitespace are: > - the positional classes (LineStart, LineEnd, StringStart, StringEnd) > - CharsNotIn > - restOfLine > > My personal choice for defining an '='s-based comment in your example would > probably be: > > Literal('==') + restOfLine > > Leave out the lineStart and lineEnd, and just ignore '==' to the end of the > current line. > > But if you want to restrict '==' comments to column 1 only, then you will > need to use the comment expression I gave at the beginning of this e-mail. > > Hope this helps - Welcome to pyparsing! Thanks a lot. (BTW, I bought a copy of the PDF booklet from O'Reilly. That is quite helpful too.) -- Kees |