Re: [Pyparsing] Anchoring py.Regex
Brought to you by:
ptmcg
From: thomas_h <th...@gm...> - 2011-09-17 18:59:08
|
Paul, thanks for looking into this. On Mon, Sep 12, 2011 at 12:13 PM, Paul McGuire <pt...@au...> wrote: > The snag here is pyparsing's default whitespace skipping. You are leading > off your Regex with whitespace, but pyparsing skips over that stuff unless > you tell it not to. I see. I was aware of the whitespace skipping, and thought the '*' quantifier after '\s' would take care of this. So in the case that the regex wouldn't see the whitespace at all '^\s*\*' should still match. But it doesn't. > This expression gives you your inner comment lines (the > negative lookahead expression '(?!/)' tells the re engine not to match if > the '*' is followed by a '/', the the trailing '*/' line is left out). > > comment_inner = Regex(r"^\s+\*(?!/).*", re.M).leaveWhitespace() This is working nicely, but I'm still concerned about the '^' anchor. If you speak about skipping whitespace, I suppose this includes newlines?! Then why is the '$' anchor working as expected, but '^' isn't? Looking further into this, I found that py.Regex(r'^\s*\*', re.M).searchString('foo\nB') matches, while py.Regex(r'^\s*\*', re.M).searchString('foo\n B') doesn't (mind the additional space towards the end of the search string). So, even without leaveWhitespace(), the start-of-line anchor seems to be honored, if there is no subsequent white space. But if there is, the match fails. Again, using the end-of-line anchor it matches as expected, and white space seems to be passed to the regex (which is what I initially expected): py.Regex(r'o\s*$', re.M).searchString('foo \nB') This matches with or without intervening space between the second 'o' and '\n', and potential whitespace is included in the match. Thomas |