My thoughts on this issue...
First some background information.
ASCII is a 7 bit character set, with one character per byte. ASCII
characters are often stringed together to form strings, and usually the
last character is set to the NUL character, which is 0. These are
usually called C strings, and I've also heard the term ASCIIZ string.
There are other character sets that only use 7 bits.
UTF-8 is a way of representing unicode using sequences of 8 bit bytes
(as opposed to UCS2 which 16 bits), and carefully designed for backward
compatibility. A UTF-8 character without the 8th bit set is also the
same character in ASCII. So a UTF-8 string that doesn't use any 8 bit
characters is also an ASCII string. Also, all nonascii characters always
use only bytes with the 8th bit set, so things like NULs or "/"'s don't
end up in them. Because it uses sequences of bytes with the 8th bit set,
it can represent all of unicode, and not just an extra 128 characters.
In GTK, all strings passed to its functions have to be in UTF-8.
You can validate UTF-8 by using g_utf8_validate(). However this only
ensures that the string is valid UTF-8. The string might (in this case
will) also be valid ISO-8859-1, or something else. So there's no way to
tell for sure what character set a string is in, given only the string.
Just because a string is valid ASCII, doesn't mean it's not really
supposed to be some other 7 bit character set.
In order to pass a string from OSCAR to the rest of Gaim, we need to put
it in UTF-8. We have to either convert it from something to UTF-8, or
else validate it. If we ever pass a string on that would fail
g_utf8_validate(), we risk crashing, and all sorts of nasty things.
As I recall, we convert from UTF-8 to ASCII when the string is supposed
to be ASCII. Or do I have that backwards? Either way this sounds kind of
expensive just to validate the string as ASCII. Why not write a
gaim_ascii_validate()? Something that iterates through the string, and
returns FALSE if it runs into a 0 or a character with the 8th bit set.
I personally hate the error messages that complain about character set
problems. Specificly I hate the one for IRC, but this ICQ one seems the
same, except I've never run into it. What I hate about them is I lose
the message, and it's usually because someone's message has one
ISO-8859-1 8 bit character, and the rest of it was fine.
That being said, what I think makes the most sense for ICQ is to run
gaim_ascii_validate on messages labeled as ASCII, and if it returns
false, treat the message as CUSTOM, since obviously it isn't ASCII like
it said it was.
Since I don't think we have a gaim_ascii_validate or anything like it,
I'll write one here. Besides, I want to try to do something constructive
with this email.
gboolean gaim_ascii_validate(const gchar *str, gint len)
if (len == -1)
len = strlen(str);
for (i = 0; i < len; i++, str++)
if (!*str || (*str & 0x80))