#37 Wrong encoding of string constants in Java code blocks

closed-fixed
Ian Robertson
Core (8)
5
2011-07-08
2011-05-25
Aron Ujvari
No

Jamon replaces non-ascii characters with question mark (?) in the string constants written into Java code blocks.

Jamon template example:

<%encoding UTF-8>
<%escape #h>
<%args>
String title = "Cég adatai";
</%args>
<%java>
String x = "céég";
</%java>
<% title %> <% x %> <% "cééég" %>

Output:
C?g adatai c??g c???g

The cause of the problem is in org.jamon.codegen.CodeWriter:

static final String JAVA_SOURCE_ENCODING = "US-ASCII";
...
m_writer = new PrintWriter(new OutputStreamWriter(p_stream, JAVA_SOURCE_ENCODING));

In the constructor the output writer uses fixed us-ascii encoding for the output code generation. While in non-codeblocks non-ascii characters in strings converted into "\\uXXXX" format, in code-block this conversion does not happen.

The best output encoding for the generated code files probably would be the encoding defined in the source file. This way there would be no need for conversion.

Discussion

  • Ian Robertson
    Ian Robertson
    2011-06-27

    According to http://download.oracle.com/javase/6/docs/technotes/tools/solaris/javac.html, the encoding the compiler uses defaults to "the platform default converter" unless a -encoding parameter is passed to the compiler. If we allow different generated java files to be encoded with different encodings, this make the compilation stage challenging.

    That said, it seems that while a user could place a burden on themselves by using inconsistent charsets, it still makes sense to write java files in the specified encoding, or in the platform default if none is specified.

     
  • Ian Robertson
    Ian Robertson
    2011-06-28

    • assigned_to: nobody --> iroberts
     
  • Ian Robertson
    Ian Robertson
    2011-06-28

    This is fixed on trunk; I hope to be cutting a new release sometime this week.

     
  • Ian Robertson
    Ian Robertson
    2011-07-08

    As of 2.4.0, generated java files are written using the same encoding as specified in the template.

     
  • Ian Robertson
    Ian Robertson
    2011-07-08

    • status: open --> closed-fixed
     


Anonymous


Cancel   Add attachments