[Jamon-hackers] [ jamon-Bugs-3307564 ] Wrong encoding of string constants in Java code blocks

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #3307564, was opened at 2011-05-25 09:18
Message generated for change (Settings changed) made by iroberts
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=741043&aid=3307564&group_id=138569

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Aron Ujvari (ujvari)
Assigned to: Ian Robertson (iroberts)
Summary: Wrong encoding of string constants in Java code blocks

Initial Comment:
Jamon replaces non-ascii characters with question mark (?) in the string constants written into Java code blocks.

Jamon template example:

<%encoding UTF-8>
<%escape #h>
<%args>
    String title = "Cég adatai";
</%args>
<%java>
String x = "céég";
</%java>
<% title %> <% x %> <% "cééég" %>

Output:
C?g adatai c??g c???g

The cause of the problem is in org.jamon.codegen.CodeWriter:

static final String JAVA_SOURCE_ENCODING = "US-ASCII";
...
m_writer = new PrintWriter(new OutputStreamWriter(p_stream, JAVA_SOURCE_ENCODING));

In the constructor the output writer uses fixed us-ascii encoding for the output code generation. While in non-codeblocks non-ascii characters in strings converted into "\\uXXXX" format, in code-block this conversion does not happen.

The best output encoding for the generated code files probably would be the encoding defined in the source file. This way there would be no need for conversion.

----------------------------------------------------------------------

>Comment By: Ian Robertson (iroberts)
Date: 2011-07-08 13:28

Message:
As of 2.4.0, generated java files are written using the same encoding as
specified in the template.

----------------------------------------------------------------------

Comment By: Ian Robertson (iroberts)
Date: 2011-06-27 18:11

Message:
This is fixed on trunk; I hope to be cutting a new release sometime this
week.

----------------------------------------------------------------------

Comment By: Ian Robertson (iroberts)
Date: 2011-06-27 13:06

Message:
According to
http://download.oracle.com/javase/6/docs/technotes/tools/solaris/javac.html,
the encoding the compiler uses defaults to "the platform default converter"
unless a -encoding parameter is passed to the compiler.  If we allow
different generated java files to be encoded with different encodings, this
make the compilation stage challenging. 

That said, it seems that while a user could place a burden on themselves
by using inconsistent charsets, it still makes sense to write java files in
the specified encoding, or in the platform default if none is specified.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=741043&aid=3307564&group_id=138569