From: Christian R. R. <ki...@as...> - 2001-06-08 17:11:18
|
(I hate not being subscribed to a list. Fixing that..) Jeff writes: Yes, but you can type them in as ISO-8859-1 eight-bit characters just fine. (What you have to do to type these in depends on your OS & browser &c.) Do we want to support entities in the page text? If we do then '&' suddenly becomes a magic character. (I vote against it.) Hey Jeff, thanks for the answer. Now that's a good question. By using non-entitized characters that are supported in Latin-1 (aka ISO-8859-1) and other character sets, you do create a problem of backwards compatibility with browsers that have no support for selecting (or discovering, if the standard indication Content-Type: text/html; charset=foobar is used) character sets. This might and might not be an issue depending on the specifics of your application, but the truth is that very few sites today issue Content-type: lines that specify the correct charset, and international browsers do break (I've seen this over in Asia, and so do we when visiting Japanese sites in Netscape). Now using entities makes this very simple (which is why I've done the conversion here, myself) and avoids any sort of encoding problem. I don't understand the point that & suddenly becomes a magic character -- why? (Perhaps more importantly; what is a magic character?). If this means it has to be allowed inside the regexp, for example, I disagree, though perhaps I'm viewing this incompletely: I only do entitizing on HTML output; internally, they are stored as Latin-1 characters (actually, whatever you specify your browser and database to handle AFAICS) and they are converted inside transform.php, which makes all internal handling treat Latin-1. I can't see at this point how this affects other character sets, but perhaps this is an alternative view from yours? Does & still have to be a magic character if it is only generated on output? I am not advocating writing (entering data in the form) entitized -- this is broken and counter-productive; only displaying entitized, to allow a wider audience with less breakage. This is _almost_ a non-issue with Latin-1, since most browsers render that by default, but on non-Latin-1 charsets (and thus, locales and languages), this _will_ be an issue. Is this besides the point? Take care, -- /\/\ Christian Reis, Senior Engineer, Async Open Source, Brazil ~\/~ http://async.com.br/~kiko/ | [+55 16] 274 4311 |