Paul,
thanks for looking into this.
On Mon, Sep 12, 2011 at 12:13 PM, Paul McGuire <ptmcg@...> wrote:
> The snag here is pyparsing's default whitespace skipping. You are leading
> off your Regex with whitespace, but pyparsing skips over that stuff unless
> you tell it not to.
I see. I was aware of the whitespace skipping, and thought the '*'
quantifier after '\s' would take care of this. So in the case that the
regex wouldn't see the whitespace at all '^\s*\*' should still match.
But it doesn't.
> This expression gives you your inner comment lines (the
> negative lookahead expression '(?!/)' tells the re engine not to match if
> the '*' is followed by a '/', the the trailing '*/' line is left out).
>
> comment_inner = Regex(r"^\s+\*(?!/).*", re.M).leaveWhitespace()
This is working nicely, but I'm still concerned about the '^' anchor.
If you speak about skipping whitespace, I suppose this includes
newlines?! Then why is the '$' anchor working as expected, but '^'
isn't?
Looking further into this, I found that
py.Regex(r'^\s*\*', re.M).searchString('foo\nB')
matches, while
py.Regex(r'^\s*\*', re.M).searchString('foo\n B')
doesn't (mind the additional space towards the end of the search
string). So, even without leaveWhitespace(), the start-of-line anchor
seems to be honored, if there is no subsequent white space. But if
there is, the match fails.
Again, using the end-of-line anchor it matches as expected, and white
space seems to be passed to the regex (which is what I initially
expected):
py.Regex(r'o\s*$', re.M).searchString('foo \nB')
This matches with or without intervening space between the second 'o'
and '\n', and potential whitespace is included in the match.
Thomas
|