ich habe hier 2 XML-Dateien bei denen ich einen Tag umformatieren müsste.
Aus
<Klassifikation>1001</Klassifikation>
<Klassifikation>100118</Klassifikation>
<Klassifikation>10011901</Klassifikation>
soll
<Klassifikation>10.01</Klassifikation>
<Klassifikation>10.01.18</Klassifikation>
<Klassifikation>10.01.19.01</Klassifikation>
werden. Die Anzahl der Ziffern ist immer gerade (4,6,8,10 oder 12). Suchen kann ich die alle, aber das Einfügen des Punktes klappt überhaupt nicht. Hat irgendwer eine Idee?
Jens
Last edit: Jens Habermann 2015-05-15
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Jens, although I don't understand German language at all, I could easily guess your needs, as you clearly explained what you want !!
I, first, tried to find a general regex, with lookarounds and the special \K syntax, but I couldn't build a right regex :-((
Then, I decided to split the problem into 2 smaller ones :
Firstly, try to match any block of text, of the form <Klassifikation>.....</Klassifikation>, with blocks of two-digits, ONLY, between the two tags.
Secondly, try to match two digits, necessarily followed by a digit, ONLY IF the previous search was matched
The first regex can be easily written as <(Klassifikation)>(\d\d)+</\1>$
and we replace all that block by itself, followed a specific character, which doesn't exist, in your file
So, the Replace zone will contain $0@, assuming, for instance, that no @ character exists, yet
You'll note that this regex can't be match twice, because it must exactly match a > character, at the end of the current line.
The second regex is even more simple \d\d(?=\d), which we will replace by $0. ( the entire line, followed by a dot )
Remainder : In both replacements, the syntax $0 represents the entire regex matched !
OK, now, we just have to perform this second regex, ONLY IF a @ character is present at the end of the line
To do so, we just have to modify, a bit, the lookahead : \d\d(?=\d.+@$)
Finally, when all the blocks of two digits - 1, are followed by a dot, we must delete, at the end of the line, the @ character, that we used as a mark. It's childlike ! just search for @ and replace by NOTHING.
Therefore, the complete search regex, built with three alternatives, becomes :
For the replacement, we'll use conditional replacements. If you're not acquainted with them, here is, below, a fast summary :
A conditional replacement is of the general form (?n ... : ... ), where n is the number of a searched group
If the group n is DEFINED, all the characters after ?n till the colon are rewritten
If the group n is NOT defined, all the characters after the colon till the ending round parenthesis, are rewritten
For example, the replacement ABC(?4ijk:pqr)XYZ would produce ABCijkXYZ, if the search group 4 is matched and would give the string ABCpqrXYZ, if the group 4 is NOT matched
Then, our replace regex may be written (?3:$0(?1@:.)) and can be understood as the two overlapped conditions, below :
IF group 3 ( The @ character ) is MATCHED, it's DELETED
ELSE
we rewrite the **MATCHED** string ( **$0** )
IF **group 1** ( the word Klassifikation ) is MATCHED ( due to ALTERNATIVE 1 )
we add the **@** character
ELSE ( ALTERNATIVE 2 )
we add a **DOT** character
ENDIF
ENDIF
Oh, yes, your regex is really simple, compared to mine ! However, your regex would add a dot, after ANY block of two digits and not only between the two tags <Klassifikation>......</Klassifikation>
Indeed, I tried to find the strict regex, from Jens's post. But, in current life, I would have used a more simple regex, like yours, by selecting the concerned text, with the In selection option of the Replace dialog, for instance :-)
BTW, your regex can, even, be shortened !
SEARCH ([>.]\d\d)(\d)
REPLACE \1.\2
Cheers,
guy038
Last edit: THEVENOT Guy 2015-05-23
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hallo,
ich habe hier 2 XML-Dateien bei denen ich einen Tag umformatieren müsste.
Aus
<Klassifikation>1001</Klassifikation>
<Klassifikation>100118</Klassifikation>
<Klassifikation>10011901</Klassifikation>
soll
<Klassifikation>10.01</Klassifikation>
<Klassifikation>10.01.18</Klassifikation>
<Klassifikation>10.01.19.01</Klassifikation>
werden. Die Anzahl der Ziffern ist immer gerade (4,6,8,10 oder 12). Suchen kann ich die alle, aber das Einfügen des Punktes klappt überhaupt nicht. Hat irgendwer eine Idee?
Jens
Last edit: Jens Habermann 2015-05-15
Search for: <Klassifikation>(\d\d)(\d\d)</Klassifikation>
Replace with: <Klassifikation>\1\.\2</Klassifikation>
Search for: <Klassifikation>(\d\d)(\d\d)(\d\d)</Klassifikation>
Replace with: <Klassifikation>\1\.\2.\3</Klassifikation>
And so on.
Not sure if it can be done all at once with regular expressions.
Last edit: Andreas Jonsson 2015-05-15
Thats it, except one Backslash to much behind \1:
Replace with: <Klassifikation>\1.\2.\3</Klassifikation>
Great. Thanks!
Jens
Hi Jens and Andreas,
Jens, although I don't understand German language at all, I could easily guess your needs, as you clearly explained what you want !!
I, first, tried to find a general regex, with lookarounds and the special
\K
syntax, but I couldn't build a right regex :-((Then, I decided to split the problem into 2 smaller ones :
Firstly, try to match any block of text, of the form <Klassifikation>.....</Klassifikation>, with blocks of two-digits, ONLY, between the two tags.
Secondly, try to match two digits, necessarily followed by a digit, ONLY IF the previous search was matched
The first regex can be easily written as
<(Klassifikation)>(\d\d)+</\1>$
and we replace all that block by itself, followed a specific character, which doesn't exist, in your file
So, the Replace zone will contain
$0@
, assuming, for instance, that no @ character exists, yetYou'll note that this regex can't be match twice, because it must exactly match a > character, at the end of the current line.
The second regex is even more simple
\d\d(?=\d)
, which we will replace by$0.
( the entire line, followed by a dot )Remainder : In both replacements, the syntax
$0
represents the entire regex matched !OK, now, we just have to perform this second regex, ONLY IF a @ character is present at the end of the line
To do so, we just have to modify, a bit, the lookahead :
\d\d(?=\d.+@$)
Finally, when all the blocks of two digits - 1, are followed by a dot, we must delete, at the end of the line, the @ character, that we used as a mark. It's childlike ! just search for @ and replace by NOTHING.
Therefore, the complete search regex, built with three alternatives, becomes :
<(Klassifikation)>(\d\d)+</\1>$|\d\d(?=\d.+@$)|(@)
For the replacement, we'll use conditional replacements. If you're not acquainted with them, here is, below, a fast summary :
A conditional replacement is of the general form
(?n ... : ... )
, where n is the number of a searched groupIf the group n is DEFINED, all the characters after ?n till the colon are rewritten
If the group n is NOT defined, all the characters after the colon till the ending round parenthesis, are rewritten
For example, the replacement
ABC(?4ijk:pqr)XYZ
would produce ABCijkXYZ, if the search group 4 is matched and would give the string ABCpqrXYZ, if the group 4 is NOT matchedThen, our replace regex may be written
(?3:$0(?1@:.))
and can be understood as the two overlapped conditions, below :To sump up :
<(Klassifikation)>(\d\d)+</\1>$|\d\d(?=\d.+@$)|(@)
REPLACE :
(?3:$0(?1@:.))
Select the regular expression search mode
Uncheck the . matches newline option, if necessary
Go back to the very beginning of your document
Click TWICE on the Replace All button
The first S/R adds a @ character, at the end of all the concerned lines
The second S/R adds a dot after all the two-digits block, but the last and, finally, delete the @ character
With that regex :
The opening tag <Klassifikation> may begin, after column 1
Any sequence of two-digits, between the two tags, will be modified
Any extra click, on the Replace All button, after the second one, has NO effect, luckily :-)
Best Regards,
guy038
Last edit: THEVENOT Guy 2015-05-19
Are they always two digit pairs? I have a simple solution:
Search: ([>.])(\d\d)(\d)
Replace: \1\2.\3
Click replace all 3 times and done.
Hi Rufus,
Oh, yes, your regex is really simple, compared to mine ! However, your regex would add a dot, after ANY block of two digits and not only between the two tags
<Klassifikation>......</Klassifikation>
Indeed, I tried to find the strict regex, from Jens's post. But, in current life, I would have used a more simple regex, like yours, by selecting the concerned text, with the In selection option of the Replace dialog, for instance :-)
BTW, your regex can, even, be shortened !
SEARCH
([>.]\d\d)(\d)
REPLACE
\1.\2
Cheers,
guy038
Last edit: THEVENOT Guy 2015-05-23