"include" output a BOM

Generates text that depends on changing data (like dynamic HTML).

Brought to you by: revusky

#108 "include" output a BOM

Status: open-accepted

Owner: nobody

Labels: None

Priority: 5

Updated: 2005-11-14

Created: 2005-11-14

Creator:

Private: No

If the included file's encoding is unicode with a BOM,
then the BOM will be output to the html page, this will
make some problem, for example:

...
<body style="margin:0px;padding:0px;">
<#include "navigation.html">
...
</body>
...

the file "navigation.html" is a unicode file with a BOM:
<table><tr><td>test</td></tr></table>

you'll find a line break in Microsoft Internet
Explorer, but not in Firefox.

But if you save the file "navigation.html" with no BOM,
then there's not any line break in IE or FF.

Discussion

Dániel Dékány - 2005-11-14

status: open --> open-accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dániel Dékány - 2005-11-14

Logged In: YES
user_id=546667

This is a tricky problem... the BOM character should occur
only as the very first character of the text file, however you
don't know if the output of a template will be the file itself, or
just part of it. Yes, in the case of <#include...> it could be find
out, but even the output of the top level template can be in the
middle of a text file (it just writes to a Writer). Maybe the best
would be if a BOM at the begining of a template is always
ignored by the template parser?

BTW, another related bug: You can't use the <#ftl> directive in
these files that start with BOM, since the <#ftl> directive
doesn't allow anything before itself but space|tab|CR|LF. And if
it would allow BOM too, then using <#ftl> would cut the BOM
out of the output (since anything before the <#ftl> is ignored).

For those of you who don't understand what's this, the problem
is that certain UTF-8 editors (like Windows XP notepad) start
the files with the BOM character uFEFF (0xEF 0xBB 0xBF in
UTF-8 necoding). This character is used to detect the charset
of the file (incorrectly, because only UTF-16 BE VS UTF-16 LE
could be detected this way).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2005-11-15

Logged In: YES
user_id=1378409

I think the java.io.Reader and Writer may not bring about
this problem, I attempted to find how the freemarker read
the template file and write out the parsed file, but I failed.
If it workes on its own reader and writer, you may do this:
read the file from stream(not reader) and decode it by its
BOM or specified encoding, then output the parsed file
encoded by the specified encoding or the system's default
encoding. In this, the BOM just help to encode or decode the
file, but not the content of the file.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dániel Dékány - 2005-11-15

Logged In: YES
user_id=546667

java.io.Reader/Writer doesn't remove BOM as far as I know.
You can fix this issue now by writing a freemarker.cache.
TemplateLoader implementation, and then plugging your
TemplateLoader object with configuration.setTemplateLoader(.
..). (But I think that this problem should be addressed in the
FreeMarker core later.)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.