Support as the NBSP encoding option
Brought to you by:
vlad_r
com.vladium.util.Strings.HTMLEscapeNB() converts spaces to \u00A0.
This is the proper encoding of an NBSP under many popular encodings
such as 8859-1, but not for other encodings, like UTF-8.
As a result the reports may be slightly garbled when EMMA is run in an
environment using a different encoding or when the HTML report is
served from a web-server with a different default encoding (web
browsers favor the encoding specified in the Content-Type header to
that specified in the meta-data).
This bug could be resolved by changing HTMLEscapeNB to replace
spaces with the string " ".
Logged In: YES
user_id=1013207
I was under the impression that \u00A0 was the Unicode
codepoint for a non-breakable space and how it was
translated into encodings such as UTF-8 was up to them.
Changing to means emitting 6 bytes instead of 1 for
every space and because there is so much white space in
typical Java sources there is a *substantial* file I/O hit.
Furthermore, if a web server ignores the document encoding
it is clearly buggy (the document obviously knows its own
encoding and is the authoritative source of that information).
Could you be suffering from this issue:
http://emma.sourceforge.net/faq.html#q.report.apache
Logged In: YES
user_id=1289404
Yes, I think it could be the same problem. Looking at EMMA a bit more, I
see that it is possible to set the encoding used when generating reports.
This gives an option to users who don't want to change their web server's
default encoding. Perhaps you want to add this to the faq?
I'm not sure that I believe that this is really a bug with the web server. Can
it really be expected to look into the resources it serves? In any event,
given the huge numbers of Apache users (and Tomcat users, that's what
I'm using), it seems like it might be worth doing something to make it work
for them.
It does seem like it might be a bug with the browsers, which should
probably trust the document over the headers. But again, given that most
browsers work this way, perhaps this is a moot point.
Anyway, for my uses I would happily use a setting which used
escape sequences rather than \u00A0. The extra bytes aren't that much
of a concern for me.
Logged In: YES
user_id=1013207
Maybe this could be put in a FAQ, but the fact is EMMA has a
bunch of props that could be changed by end users and they
are all documented already (see section 3 reference manual).
Once again, this is not an EMMA bug. EMMA writes 100%
Unicode content to a Java output stream of (ultimately) your
chosen encoding. ISO-8859-1 is merely a default. For
example, I can switch to UTF-8 encoding
(-Dreport.html.out.encoding=UTF-8). Then the Unicode NBSP
symbol is encoded as a two-byte UTF-8 sequence and it still
works fine in all browsers I can get my hands on. This has
been tested way before 2.0 released, of course.
It may not be a bug with a web server per se, but as you
say with the server+browser combination. But the salient
points of this discussion would be:
- you have a well-supported EMMA workaround (set the
report.html.out.encoding property to UTF-8 to match your
server's default)
- you have a server workaround (change default encoding to
match EMMA's default)
- I don't consider this to be an EMMA bug but I can
re-classify this issue as an RFE to support people who don't
want to do either of the above.