|
From: Mark D. ☕️ <ma...@ma...> - 2016-01-12 10:52:39
|
Here is what is happening.
(Note: If you want to see what is happening in transliteration, change
Transliterator.DEBUG=false to true in the source. You get a large log of
what happens internally.)
If you print out the combined rules, you get:
$beforeLower = [[:Mn:][:Me:]]* [:Lowercase:];
\u00e4 > ae;
\u00f6 > oe;
\u00fc > ue;
\u00c4 } $beforeLower > Ae;
\u00d6 } $beforeLower > Oe;
\u00dc } $beforeLower > Ue;
\u00c4 > AE;
\u00d6 > OE;
\u00dc > UE;
*::[[:Latin:][:Common:][:Inherited:][〇]];*
::NFD();
[:Latin:]{[:Mn:]+} > ;
::NFC();
Æ > AE;
Ð > D;
Ø > O;
Þ > TH;
ß > ss;
æ > ae;
The problem is that the filtering rule
"::[[:Latin:][:Common:][:Inherited:][〇]];"
needs to come at the start.
The simplest way to get this to work that is to construct your own rules by
inserting the German ones just before "::NFD();", eg
String germanASCIIRules = latinASCIIRules.replace("::NFD();",
german_DIN_5007_2Rules + "\n::NFD();");
However, a bit better approach would be to recast the German rules as
applying to the NFD form, and put them *after* the ::NFD().
String german_DIN_5007_2Rules =
"$beforeLower = [[:Mn:][:Me:]]* [:Lowercase:];\n"
+ "[aou] { \u0308 > e ;\n"
+ "[AOU] { \u0308 } $beforeLower > e ;\n"
+ "[AOU] { \u0308 > e ;";
String germanASCIIRules = latinASCIIRules.replace("::NFD();",
"::NFD();\n" + german_DIN_5007_2Rules);
(There's API for programmatically reading the translit rules, but that's
overkill in this case.)
Mark
On Tue, Jan 12, 2016 at 9:25 AM, Krystian Marek <kry...@kr...>
wrote:
> public static void main(String[] args)
> {
> Transliterator latinASCII =
> Transliterator.getInstance("Latin-ASCII");
> String german_DIN_5007_2Rules ="$beforeLower = [[:Mn:][:Me:]]*
> [:Lowercase:];\n" +
> "\\u00e4 > ae;\n" +
> "\\u00f6 > oe;\n" +
> "\\u00fc > ue;\n" +
> "\\u00c4 } $beforeLower > Ae;\n" +
> "\\u00d6 } $beforeLower > Oe;\n" +
> "\\u00dc } $beforeLower > Ue;\n" +
> "\\u00c4 > AE;\n" +
> "\\u00d6 > OE;\n" +
> "\\u00dc > UE;\n";
> //"\\u00df > ss;\n";
>
> String latinASCIIRules = latinASCII.toRules(true);
>
> String germanASCIIRules = german_DIN_5007_2Rules + latinASCIIRules;
>
> Transliterator germanASCII =
> Transliterator.createFromRules("german_DIN_5007_2", germanASCIIRules,
> Transliterator.FORWARD);
>
> String result1 = germanASCII.transliterate("Häuser Bäume Höfe
> Gärten daß Ü ü ö ä Ä Ö ß");
> String result2 =
> germanASCII.transliterate("Ç,ü,é,â,ä,à,ç,ê,ë,è,ï,î,ì,Ä,Å,É,æ,Æ,ô,ö,ò,û,ù,Ô,Û,Ã,ã,Ñ,Õ,õ,Ä,Ë,Ï,Ö,Ü,Ÿ,Ç,Œ,œ,ū,Ð,ð,Ċ,ċ,Ġ,ġ,ů,Ů,š,Š,Ě,ť,ž,Ć,Ł,Ó,Ź,ą,ę,ń,ś,ż,ÿ,Ö,Ü,á,í,ó,ú,ñ,Ñ,À,È,Ì,Ò,Ù,Á,É,Í,Ó,Ú,Ý,Â,Ê,Î,ß,Ø,ø,Å,å,Þ,þ,Ā,Ē,Ī,Ō,Ū,ā,ē,ī,ō,ě,Ů,ů,Č,č,Ď,ď,Ľ,ľ,Ň,ň,Ř,ř,Š,š,Ť,Ž,Ą,Ę,Ń,Ś,Ż,ć,ł,ó,ź,
> ,/");
>
> System.out.println(result1);
> System.out.println(result2);
> }
>
Mark
|