From: Christian Robottom Reis <kiko@as...> - 2001-06-08 17:11:18
(I hate not being subscribed to a list. Fixing that..)
Yes, but you can type them in as ISO-8859-1 eight-bit characters
just fine. (What you have to do to type these in depends on your
OS & browser &c.)
Do we want to support entities in the page text? If we do then
'&' suddenly becomes a magic character. (I vote against it.)
Hey Jeff, thanks for the answer.
Now that's a good question. By using non-entitized characters that are
supported in Latin-1 (aka ISO-8859-1) and other character sets, you do
create a problem of backwards compatibility with browsers that have no
support for selecting (or discovering, if the standard indication
Content-Type: text/html; charset=foobar is used) character sets.
This might and might not be an issue depending on the specifics of your
application, but the truth is that very few sites today issue
Content-type: lines that specify the correct charset, and international
browsers do break (I've seen this over in Asia, and so do we when visiting
Japanese sites in Netscape).
Now using entities makes this very simple (which is why I've done the
conversion here, myself) and avoids any sort of encoding problem. I don't
understand the point that & suddenly becomes a magic character -- why?
(Perhaps more importantly; what is a magic character?). If this means it
has to be allowed inside the regexp, for example, I disagree, though
perhaps I'm viewing this incompletely:
I only do entitizing on HTML output; internally, they are stored as
Latin-1 characters (actually, whatever you specify your browser and
database to handle AFAICS) and they are converted inside transform.php,
which makes all internal handling treat Latin-1. I can't see at this point
how this affects other character sets, but perhaps this is an alternative
view from yours?
Does & still have to be a magic character if it is only generated on
output? I am not advocating writing (entering data in the form) entitized
-- this is broken and counter-productive; only displaying entitized, to
allow a wider audience with less breakage. This is _almost_ a non-issue
with Latin-1, since most browsers render that by default, but on
non-Latin-1 charsets (and thus, locales and languages), this _will_ be an
issue. Is this besides the point?
/\/\ Christian Reis, Senior Engineer, Async Open Source, Brazil
~\/~ http://async.com.br/~kiko/ | [+55 16] 274 4311