Menu

With Chinese locale, FreeMarker is altering static text in Template.process()

DxRx
2014-06-13
2014-06-14
  • DxRx

    DxRx - 2014-06-13

    I am running FreeMarker on Windows 7 in the Chinese locale.
    A template I was using included the character Æ (xC6). After running Template.process() on the data, the single byte of the Æ character was unexpectedly replaced with series of 3 bytes.

    I reduced my template to 6 bytes (with no text substitution occurring).

    Here's the hexdump of the original template file:
    00000000 c6 26 23 31 30 3b |Æ
|

    After running Template.process() on it while in the Chinese locale:
    00000000 ef bf bd 26 23 31 30 3b |...
|

    The single byte of c6 was altered into the three byte sequence of ef bf bd.

    This alteration of the c6 byte did not occur if running in Windows in a US/English or any other locale than Chinese that I tried.

    No doubt there's some unexpected encoding conversion that FreeMarker is doing here. The original full template is an XML file where the c6 byte is used as a special marker character that needed to stay as it was.

    For my purposes, I was able to provide a workaround by hex-encoding the c6 (as "&#xc6") so FreeMarker wouldn't alter the text.

    But is seems to me that FreeMarker shouldn't be altering any original static text in templates regardless of the locale.

     
  • Dániel Dékány

    Charset conversion are technically unavoidable, because first FreeMarker loads the template from somewhere, for which it must be decoded from its binary form to UNICODE, then later you have to write the result back to a file (for example), where the UNICODE text must be encoded as binary data. The two charsets may differ, but that won't corrupt the text, it only translates between the charsets. What corrupts the text is if you read the template file with a different charset than it was actually saved with. Maybe the application you are using has forgotten to set the template charset and falls back to the system default for reading templates. The application also has to chose what charset it uses for storing the tempalte output (that's not even in the hands of FreeMarker).

    Seems your template is in UTF-8, so start by setting the default_encoding property to UTF-8 (or configuration.setDefaultEncoding("utf-8")).

     

    Last edit: Dániel Dékány 2014-06-14

Log in to post a comment.