Menu

#86 Support   as the NBSP encoding option

next minor version
open-rejected
None
1
2005-06-02
2005-06-01
No

com.vladium.util.Strings.HTMLEscapeNB() converts spaces to \u00A0.
This is the proper encoding of an NBSP under many popular encodings
such as 8859-1, but not for other encodings, like UTF-8.

As a result the reports may be slightly garbled when EMMA is run in an
environment using a different encoding or when the HTML report is
served from a web-server with a different default encoding (web
browsers favor the encoding specified in the Content-Type header to
that specified in the meta-data).

This bug could be resolved by changing HTMLEscapeNB to replace
spaces with the string " ".

Discussion

  • Vlad Roubtsov

    Vlad Roubtsov - 2005-06-01
    • status: open --> pending-rejected
     
  • Vlad Roubtsov

    Vlad Roubtsov - 2005-06-01

    Logged In: YES
    user_id=1013207

    I was under the impression that \u00A0 was the Unicode
    codepoint for a non-breakable space and how it was
    translated into encodings such as UTF-8 was up to them.

    Changing to   means emitting 6 bytes instead of 1 for
    every space and because there is so much white space in
    typical Java sources there is a *substantial* file I/O hit.

    Furthermore, if a web server ignores the document encoding
    it is clearly buggy (the document obviously knows its own
    encoding and is the authoritative source of that information).

    Could you be suffering from this issue:
    http://emma.sourceforge.net/faq.html#q.report.apache

     
  • Adam Messinger

    Adam Messinger - 2005-06-01
    • status: pending-rejected --> open-rejected
     
  • Adam Messinger

    Adam Messinger - 2005-06-01

    Logged In: YES
    user_id=1289404

    Yes, I think it could be the same problem. Looking at EMMA a bit more, I
    see that it is possible to set the encoding used when generating reports.
    This gives an option to users who don't want to change their web server's
    default encoding. Perhaps you want to add this to the faq?

    I'm not sure that I believe that this is really a bug with the web server. Can
    it really be expected to look into the resources it serves? In any event,
    given the huge numbers of Apache users (and Tomcat users, that's what
    I'm using), it seems like it might be worth doing something to make it work
    for them.

    It does seem like it might be a bug with the browsers, which should
    probably trust the document over the headers. But again, given that most
    browsers work this way, perhaps this is a moot point.

    Anyway, for my uses I would happily use a setting which used  
    escape sequences rather than \u00A0. The extra bytes aren't that much
    of a concern for me.

     
  • Vlad Roubtsov

    Vlad Roubtsov - 2005-06-02
    • labels: 634112 -->
    • milestone: 410874 --> next minor version
    • priority: 5 --> 1
    • summary: Problems with NBSPs and some encodings --> Support   as the NBSP encoding option
     
  • Vlad Roubtsov

    Vlad Roubtsov - 2005-06-02

    Logged In: YES
    user_id=1013207

    Maybe this could be put in a FAQ, but the fact is EMMA has a
    bunch of props that could be changed by end users and they
    are all documented already (see section 3 reference manual).

    Once again, this is not an EMMA bug. EMMA writes 100%
    Unicode content to a Java output stream of (ultimately) your
    chosen encoding. ISO-8859-1 is merely a default. For
    example, I can switch to UTF-8 encoding
    (-Dreport.html.out.encoding=UTF-8). Then the Unicode NBSP
    symbol is encoded as a two-byte UTF-8 sequence and it still
    works fine in all browsers I can get my hands on. This has
    been tested way before 2.0 released, of course.

    It may not be a bug with a web server per se, but as you
    say with the server+browser combination. But the salient
    points of this discussion would be:

    - you have a well-supported EMMA workaround (set the
    report.html.out.encoding property to UTF-8 to match your
    server's default)

    - you have a server workaround (change default encoding to
    match EMMA's default)

    - I don't consider this to be an EMMA bug but I can
    re-classify this issue as an RFE to support people who don't
    want to do either of the above.

     

Log in to post a comment.