Menu

Mondrian with umlauts

Alexandra Davidoiu (Iancu)
Attachments
broken_schoen.png (2333 bytes)
nice_schoen.png (2016 bytes)

Due to this ticket http://jira.pentaho.com/browse/MONDRIAN-1751 Mondrian is unable at the moment to correctly interpret UTF-8 characters in the schema files. We see the following problem:

Let's say we have a dimension named Schön in our schema. When we do discover dimensions we see something like this in the response:

    <DIMENSION_NAME>Sch￶n</DIMENSION_NAME>

Those three broken characters are EF BF B6 and checking here http://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280 this is an Unicode REPLACEMENT CHARACTER. The chain downstream does its job and replace it accordingly:

We checked in the schema file with a hex editor and found the character being correctly written as C3 B6 (check here http://www.utf8-chartable.de/unicode-utf8-table.pl U+00F6). One observation: If the schema file is correctly marked with the UTF-8 preamble (EF BB BF) it wont be loaded. Without the preamble it will work and the behavior described above will occur.

There is a working solution for this as follows

Instead of writing the actual character in the schema, use its code page value. In the case of ö this is &#246; so our definition will look like this:

<Dimension foreignKey="CUSTOMERNUMBER" name="Sch&#246;n">

This time we see the following response:

<DIMENSION_NAME>Schön</DIMENSION_NAME>

And those two characters are what we were expecting in the first place: C3 B6. Everything works nice downstream:

In conclusion:

Don't use special characters in your schema files till the ticket is fixed. Use CodePage and don't forget to escape them properly. Everything else works.

To find a special character check its value in the codepage, convert it to decimal and escape it. Some examples:

ö -> F6 -> 246 -> &#246;
ä -> E4 -> 228 -> &#228;
ü -> FC -> 252 -> &#252;

With many thanks to Roth, the originator of this solution (read his post here http://forums.pentaho.com/showthread.php?56661-Problems-with-special-characters-(umlauts)-in-JPivot-Mondrian)


Related

Wiki: Home