EMMA code coverage / Feature Requests / #86 Support as the NBSP encoding option

#86 Support   as the NBSP encoding option

Milestone: next minor version

Status: open-rejected

Owner: Vlad Roubtsov

Labels: None

Priority: 1

Updated: 2005-06-02

Created: 2005-06-01

Creator: Adam Messinger

Private: No

com.vladium.util.Strings.HTMLEscapeNB() converts spaces to \u00A0.
This is the proper encoding of an NBSP under many popular encodings
such as 8859-1, but not for other encodings, like UTF-8.

As a result the reports may be slightly garbled when EMMA is run in an
environment using a different encoding or when the HTML report is
served from a web-server with a different default encoding (web
browsers favor the encoding specified in the Content-Type header to
that specified in the meta-data).

This bug could be resolved by changing HTMLEscapeNB to replace
spaces with the string " ".

Discussion

Vlad Roubtsov - 2005-06-01

status: open --> pending-rejected
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vlad Roubtsov - 2005-06-01

Logged In: YES
user_id=1013207

I was under the impression that \u00A0 was the Unicode
codepoint for a non-breakable space and how it was
translated into encodings such as UTF-8 was up to them.

Changing to   means emitting 6 bytes instead of 1 for
every space and because there is so much white space in
typical Java sources there is a *substantial* file I/O hit.

Furthermore, if a web server ignores the document encoding
it is clearly buggy (the document obviously knows its own
encoding and is the authoritative source of that information).

Could you be suffering from this issue:
http://emma.sourceforge.net/faq.html#q.report.apache

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adam Messinger - 2005-06-01

status: pending-rejected --> open-rejected
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adam Messinger - 2005-06-01

Logged In: YES
user_id=1289404

Yes, I think it could be the same problem. Looking at EMMA a bit more, I
see that it is possible to set the encoding used when generating reports.
This gives an option to users who don't want to change their web server's
default encoding. Perhaps you want to add this to the faq?

I'm not sure that I believe that this is really a bug with the web server. Can
it really be expected to look into the resources it serves? In any event,
given the huge numbers of Apache users (and Tomcat users, that's what
I'm using), it seems like it might be worth doing something to make it work
for them.

It does seem like it might be a bug with the browsers, which should
probably trust the document over the headers. But again, given that most
browsers work this way, perhaps this is a moot point.

Anyway, for my uses I would happily use a setting which used  
escape sequences rather than \u00A0. The extra bytes aren't that much
of a concern for me.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vlad Roubtsov - 2005-06-02

labels: 634112 -->

milestone: 410874 --> next minor version

priority: 5 --> 1

summary: Problems with NBSPs and some encodings --> Support as the NBSP encoding option
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vlad Roubtsov - 2005-06-02

Logged In: YES
user_id=1013207

Maybe this could be put in a FAQ, but the fact is EMMA has a
bunch of props that could be changed by end users and they
are all documented already (see section 3 reference manual).

Once again, this is not an EMMA bug. EMMA writes 100%
Unicode content to a Java output stream of (ultimately) your
chosen encoding. ISO-8859-1 is merely a default. For
example, I can switch to UTF-8 encoding
(-Dreport.html.out.encoding=UTF-8). Then the Unicode NBSP
symbol is encoded as a two-byte UTF-8 sequence and it still
works fine in all browsers I can get my hands on. This has
been tested way before 2.0 released, of course.

It may not be a bug with a web server per se, but as you
say with the server+browser combination. But the salient
points of this discussion would be:

- you have a well-supported EMMA workaround (set the
report.html.out.encoding property to UTF-8 to match your
server's default)

- you have a server workaround (change default encoding to
match EMMA's default)

- I don't consider this to be an EMMA bug but I can
re-classify this issue as an RFE to support people who don't
want to do either of the above.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.