Menu

Replace with regex ^20 matching same line multiple times?

Patrick
2014-10-02
2014-10-04
  • Patrick

    Patrick - 2014-10-02

    Hey all,

    Quick question – I have a list of numbers like:

    2020400

    2020200

    2023200

    I used the Replace All (replacement = empty string) functionality with the
    regex “^20” to remove the leading “20” from the number. However, in effect,
    it just removes every instance of “20” in the start of the string, as if I
    had actually written ^(20)*. The result then is this:

    400

    0

    23200

    Even trying something like "^(20){1}" doesn't seem to work, it will match
    the "20" multiple times.

    I think it might be due to the way the regex is being applied: it looks
    like repeatedly to the same line, so it matches the ^ more than once. Is
    this a bug or feature? I guess I was expecting a regex with a leading "^"
    to be applied just once per line. As a counter datapoint:

    500202020

    ...with the regex "20$" matches the last line just once, whereas I would
    have expected that it might have buggily matched three times. The regex
    engine doesn't do that though (which is good), but that appears
    inconsistent with the results I'm seeing when using "^".

    Thoughts? Is this a real bug? If so, I'll file one, but I first want to
    know if I'm just wrong -- my regex foo is weak and I'll defer to experts on
    the subject if I'm just not getting something... :|

    --Patrick

     
  • Andreas Jonsson

    Andreas Jonsson - 2014-10-02

    I think this is a bug. There is a special case in the code which prevents multiple replacements when replacing with "20$" or similar. Seems a bit finicky to fix.

     

    Last edit: Andreas Jonsson 2014-10-02
  • THEVENOT Guy

    THEVENOT Guy - 2014-10-02

    Hello Patrick, Andreas, and All

    It's definitively not a bug but the simple result of the way the regex engine works !

    As you said in your post, as soon of the first string "20" is deleted, the cursor is, then, located before the next 20 string, which still match the complete regex ^20 :-( it's quite at the beginning of the current line, isn't it ?

    This wrong behaviour doesn't occur when we search, for example, the string 20, at the end of a line, with the regex 20$. Indeed, once the ending 20 is deleted, the cursor location is, now, just before the EOL characters of the line, which can't, obviously, match the string 20 !

    So, how to get the right behaviour ? Just use the search/replacement below, as a work-around :

    SEARCH ^20(.*)

    REPLACE \1

    The remainder of the line, after a first string 20, is stored as the group 1, which is rewritten in the replacement part. Et voilà !

    On the same way :

    • To delete the first five characters of any line, use :

    SEARCH ^.....(.*)

    REPLACE \1

    • To delete the first thirty characters of any line, use :

    SEARCH ^.{30}(.*)

    REPLACE \1


    IMPORTANT :

    DON'T check the . matches newline option, in the Replace dialog, for this kind of search/replacement !

    Cheers,

    guy038

    P.S. :

    I, usually, keep an improved version of the N++ regex engine, created by François-R Boyer, in May-June 2013. but which is based on the previous version 2.2.7.0 of Scintilla, used by Notepad++ !

    So, Patrick, I did a try and, of course, it gives the same odd result that the present N++ regex engine !

     

    Last edit: THEVENOT Guy 2014-10-02
  • Andreas Jonsson

    Andreas Jonsson - 2014-10-02

    While your explanation of how it works seems correct, I still maintain that this is, at the very least, not the right behavior. If I type in "2020400" and ask N++ to mark all occurrences of "^20" it will only mark the first "20". If I ask it to find all occurrences of "^20" it will only find the first "20". But when I ask it to replace all "^20" it will replace "2020"?

     

    Last edit: Andreas Jonsson 2014-10-02
  • THEVENOT Guy

    THEVENOT Guy - 2014-10-02

    Hi Andreas, Patrick and All

    Ah yes, Andreas, you're right ! I should have tried the Find or Mark features :-(

    But the explication of these two different behaviours is quite simple, although not obvious, at first sight !!

    Contrary to a Replacement operation, when you perform a Find or a Mark operations, you DON'T modify the lines scanned !

    So, in the example string "2020400", the first Find or Mark operation finds the first string 20, at beginning of line and the cursor location is just before the second string 20.

    But, this time, this second string is NOT at beginning of the current line and, so, doesn't match the regex ^20 !

    And, therefore, the next matched string is an other string 20, at the very beginning of an other line, further !

    But, if we consider a replacement, the first string 20 is now deleted and the second string 20 does match the regex ^20 !

    Ah, I'm quite pleased to have found out the reason of the two behaviours of the regex engine :-)

    Cheers,

    guy038

    P.S. :

    I'm just thinking ( at 01.30am => better go to bed ! ) that this replacement is exactly like performing the successive operations below :

    • Search for the regex ^20

    • Manually delete the string 20 selected

    • Search again for the regex ^20

    • Manually delete the string 20 selected

    and so on ...

     

    Last edit: THEVENOT Guy 2014-10-02
    • Patrick

      Patrick - 2014-10-03

      OK, so I see how this is a workaround, but the behavior doesn't seem correct. While, yes, it is predictable once you are aware of the implementation details, if I told you (without context) that "^20" matched the same line three times, you'd probably say "BUG?!".

      I'm not looking for an explanation of why this behavior happens, I'm asking whether it makes sense. In its reduced form, it is simply: Should "^20" match any string multiple times during a replace? and I believe the answer is Never. Thus, I believe it is a bug due to how it is implemented.

       
  • THEVENOT Guy

    THEVENOT Guy - 2014-10-04

    Hi Patrick,

    I understand your point of view, but it's just the deep difference between common language and computer language !!

    Here is an other case, which produce similar issue.

    let's use the subject lines, below :

    test 202020400 test
    test 202020400 test
    test 202020400 test

    If we use \b20 as the searched regex and leave the replacement string empty, when we click on the replace All button, any string 20 is deleted instead of the first one of each word 202020400 :-(

    Note : \b is an location assertion which represents a start of word

    In both cases ( ^20 and \b20 ), this "pseudo" issue occurs because the cursor location is NOT changed after the first replacement It produces the same behaviour than the option Recursive replacement of few regex engines !


    If you are worried about this logic, but odd, behaviour, you may, also, use the N++ plugin TextFX ( a Swiss knife, which has been told unstable with Unicode versions of N++, although I've never noticed anything wrong yet ! )

    Then, once the TextFX installed, go to the option menu TextFX - TextFX Quick - Find/Replace or hit CTRL + R

    • Type ^20 in the first zone and leave empty the second one

    • Check the option Regular Expr

    • Check the option 1 per line ( So, there will one replacement ONLY on each line ! )

    • Click, once, on the Find button

    • Click, once on the Replace Rest button

    Et voilà ! This time, only the first string 20 of all the lines are deleted

    See the attached picture TextFX.png, below


    But, do notice that the fact of checking the 1 per line option is the proof that :

    • You're asking for a specific use of the regex engine

    • The normal behaviour, with that particular regex, is to delete any string 20, beginning the lines

    Cheers,

    guy038

    One more thing :

    It's just like you could "say" to the regex engine :

    • Replace any string 20, beginning a line, at any time, during the replacement process ( the present N++ replacement )

    • Replace any string 20, beginning a line, based, only, on present state of the file's contents ( the replacement that you expect to )

     

    Last edit: THEVENOT Guy 2014-10-04