Menu

#70 Missing Encoding Declaration

open
nobody
5
2007-11-10
2007-11-10
Archilles
No

Hello,

there seems to be a problem with unicode characters in the generated html files. Subversion stores its log output in utf-8 (at least in my case) and browsers usually use iso-8859-1. Therefore any multibyte chars (i.e. german "umlaute", generally all non-asc-ii) are shown incorrectly just byte-per-byte.

The following line in html header should help:

<meta name="content-type" content="text/html; charset=utf-8" />

I use version 0.3.1.

Discussion

  • Jason Kealey

    Jason Kealey - 2008-06-15

    Logged In: YES
    user_id=1498924
    Originator: NO

    Sorry for the late reply.

    Looking at the current output in v0.4, I see:
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>

    This code is in StatCVS. Given my lack of experience with charsets, I'm not the best person to decide the best course of action for StatCVS/StatSVN with regards to a general charset.

    All I can say is that, for SVN, the log output appears to always be in UTF-8 but we've gotten errors from parsers that don't recognize some characters. (See comments here http://blog.lavablast.com/post/2008/03/Upcoming-StatCVSStatSVN-release.aspx\).

    Do you think this is a limitation of the xml parsers on some platforms (which don't support multi-byte UTF-8) or something else?

     
  • Archilles

    Archilles - 2008-06-18

    Logged In: YES
    user_id=564080
    Originator: YES

    I changed "iso-8859-1" to "utf-8", reload the page and have correct chars. Maybe some (java) xml parsers do have broken unicode implementations, but I'm not a cross-platform expert. Younger prasers should handle it - at least on Linux my last xml trouble was years ago. MacOS should be fine too. Don't know about Windows as I use it only for gaming :)

    Maybe you could just let the raw bytestream pass on (known) broken parsers. Unicode is just some kind of interpretation and the browser does it finally. Well, okay this is dirty and may cause security implications. Perhaps someone experienced in unicode knows a tip...

     

Log in to post a comment.