Re: [Pyparsing] Anchoring py.Regex

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Paul,

thanks for looking into this.

On Mon, Sep 12, 2011 at 12:13 PM, Paul McGuire <pt...@au...> wrote:
> The snag here is pyparsing's default whitespace skipping.  You are leading
> off your Regex with whitespace, but pyparsing skips over that stuff unless
> you tell it not to.

I see. I was aware of the whitespace skipping, and thought the '*'
quantifier after '\s' would take care of this. So in the case that the
regex wouldn't see the whitespace at all '^\s*\*' should still match.
But it doesn't.

> This expression gives you your inner comment lines (the
> negative lookahead expression '(?!/)' tells the re engine not to match if
> the '*' is followed by a '/', the the trailing '*/' line is left out).
>
>    comment_inner = Regex(r"^\s+\*(?!/).*", re.M).leaveWhitespace()

This is working nicely, but I'm still concerned about the '^' anchor.
If you speak about skipping whitespace, I suppose this includes
newlines?! Then why is the '$' anchor working as expected, but '^'
isn't?

Looking further into this, I found that

  py.Regex(r'^\s*\*', re.M).searchString('foo\nB')

matches, while

  py.Regex(r'^\s*\*', re.M).searchString('foo\n B')

doesn't (mind the additional space towards the end of the search
string). So, even without leaveWhitespace(), the start-of-line anchor
seems to be honored, if there is no subsequent white space. But if
there is, the match fails.

Again, using the end-of-line anchor it matches as expected, and white
space seems to be passed to the regex (which is what I initially
expected):

  py.Regex(r'o\s*$', re.M).searchString('foo \nB')

This matches with or without intervening space between the second 'o'
and '\n', and potential whitespace is included in the match.

Thomas