Menu

Suchen und Ersetzen mit Regexp in Notepad++

2015-05-15
2015-05-23
  • Jens Habermann

    Jens Habermann - 2015-05-15

    Hallo,

    ich habe hier 2 XML-Dateien bei denen ich einen Tag umformatieren müsste.

    Aus
    <Klassifikation>1001</Klassifikation>
    <Klassifikation>100118</Klassifikation>
    <Klassifikation>10011901</Klassifikation>

    soll
    <Klassifikation>10.01</Klassifikation>
    <Klassifikation>10.01.18</Klassifikation>
    <Klassifikation>10.01.19.01</Klassifikation>

    werden. Die Anzahl der Ziffern ist immer gerade (4,6,8,10 oder 12). Suchen kann ich die alle, aber das Einfügen des Punktes klappt überhaupt nicht. Hat irgendwer eine Idee?

    Jens

     

    Last edit: Jens Habermann 2015-05-15
  • Andreas Jonsson

    Andreas Jonsson - 2015-05-15

    Search for: <Klassifikation>(\d\d)(\d\d)</Klassifikation>
    Replace with: <Klassifikation>\1\.\2</Klassifikation>

    Search for: <Klassifikation>(\d\d)(\d\d)(\d\d)</Klassifikation>
    Replace with: <Klassifikation>\1\.\2.\3</Klassifikation>

    And so on.

    Not sure if it can be done all at once with regular expressions.

     

    Last edit: Andreas Jonsson 2015-05-15
  • Jens Habermann

    Jens Habermann - 2015-05-19

    Thats it, except one Backslash to much behind \1:

    Replace with: <Klassifikation>\1.\2.\3</Klassifikation>

    Great. Thanks!

    Jens

     
  • THEVENOT Guy

    THEVENOT Guy - 2015-05-19

    Hi Jens and Andreas,

    Jens, although I don't understand German language at all, I could easily guess your needs, as you clearly explained what you want !!

    I, first, tried to find a general regex, with lookarounds and the special \K syntax, but I couldn't build a right regex :-((

    Then, I decided to split the problem into 2 smaller ones :


    • Firstly, try to match any block of text, of the form <Klassifikation>.....</Klassifikation>, with blocks of two-digits, ONLY, between the two tags.

    • Secondly, try to match two digits, necessarily followed by a digit, ONLY IF the previous search was matched

    The first regex can be easily written as <(Klassifikation)>(\d\d)+</\1>$

    and we replace all that block by itself, followed a specific character, which doesn't exist, in your file

    So, the Replace zone will contain $0@, assuming, for instance, that no @ character exists, yet

    You'll note that this regex can't be match twice, because it must exactly match a > character, at the end of the current line.

    The second regex is even more simple \d\d(?=\d), which we will replace by $0. ( the entire line, followed by a dot )

    Remainder : In both replacements, the syntax $0 represents the entire regex matched !

    OK, now, we just have to perform this second regex, ONLY IF a @ character is present at the end of the line

    To do so, we just have to modify, a bit, the lookahead : \d\d(?=\d.+@$)

    Finally, when all the blocks of two digits - 1, are followed by a dot, we must delete, at the end of the line, the @ character, that we used as a mark. It's childlike ! just search for @ and replace by NOTHING.

    Therefore, the complete search regex, built with three alternatives, becomes :

    <(Klassifikation)>(\d\d)+</\1>$|\d\d(?=\d.+@$)|(@)


    For the replacement, we'll use conditional replacements. If you're not acquainted with them, here is, below, a fast summary :

    A conditional replacement is of the general form (?n ... : ... ), where n is the number of a searched group

    • If the group n is DEFINED, all the characters after ?n till the colon are rewritten

    • If the group n is NOT defined, all the characters after the colon till the ending round parenthesis, are rewritten

    For example, the replacement ABC(?4ijk:pqr)XYZ would produce ABCijkXYZ, if the search group 4 is matched and would give the string ABCpqrXYZ, if the group 4 is NOT matched


    Then, our replace regex may be written (?3:$0(?1@:.)) and can be understood as the two overlapped conditions, below :

    IF group 3 ( The @ character ) is MATCHED, it's DELETED
      ELSE
        we rewrite the **MATCHED** string  ( **$0** )
        IF **group 1** ( the word Klassifikation ) is MATCHED ( due to ALTERNATIVE 1 )
          we add the **@** character
        ELSE                     ( ALTERNATIVE 2 )
          we add a **DOT** character
        ENDIF
    ENDIF
    


    To sump up :

    • SEARCH :

    <(Klassifikation)>(\d\d)+</\1>$|\d\d(?=\d.+@$)|(@)

    • REPLACE : (?3:$0(?1@:.))

    • Select the regular expression search mode

    • Uncheck the . matches newline option, if necessary

    • Go back to the very beginning of your document

    • Click TWICE on the Replace All button

    The first S/R adds a @ character, at the end of all the concerned lines

    The second S/R adds a dot after all the two-digits block, but the last and, finally, delete the @ character

    With that regex :

    • The opening tag <Klassifikation> may begin, after column 1

    • Any sequence of two-digits, between the two tags, will be modified

    • Any extra click, on the Replace All button, after the second one, has NO effect, luckily :-)

    Best Regards,

    guy038

     

    Last edit: THEVENOT Guy 2015-05-19
  • Rufus V. Smith

    Rufus V. Smith - 2015-05-21

    Are they always two digit pairs? I have a simple solution:

    Search: ([>.])(\d\d)(\d)
    Replace: \1\2.\3

    Click replace all 3 times and done.

     
  • THEVENOT Guy

    THEVENOT Guy - 2015-05-23

    Hi Rufus,

    Oh, yes, your regex is really simple, compared to mine ! However, your regex would add a dot, after ANY block of two digits and not only between the two tags <Klassifikation>......</Klassifikation>

    Indeed, I tried to find the strict regex, from Jens's post. But, in current life, I would have used a more simple regex, like yours, by selecting the concerned text, with the In selection option of the Replace dialog, for instance :-)

    BTW, your regex can, even, be shortened !

    SEARCH ([>.]\d\d)(\d)

    REPLACE \1.\2

    Cheers,

    guy038

     

    Last edit: THEVENOT Guy 2015-05-23