Evan Martin spake unto us the following wisdom:
> On Sat, 2003-09-27 at 10:39, Ethan Blanton wrote:
> > The "appropriate" thing to do would be for gaim to bust up long
> > messages into several shorter messages. I will eventually
> > reimplement this, but doing it correctly with respect to
> > international characters is somewhat difficult.
> Does IRC even support international characters?
Yes, although not very intelligently. Assuming you know what
character set the other user is sending, you may use a non-ASCII
character set. All but the oldest IRC servers are now 8-bit clean
(and, in fact, while the specification is somewhat ambiguous if I
recall correctly, it seems to indicate that they should be), so
providing you follow the _other_ rules correctly (limitations in
characters allowed for various tokens such as hostnames) you may use a
convenient character set.
> If the limit is number of characters (as opposed to bytes), it would
> seem g_utf8_strncpy() could do the trick. Otherwise, it gets pretty
Why would it be the number of logical characters? This requires the
server to know every character set which might be used, and
furthermore seems to miss the point of fixed limits ... the point is
that the server may allocate a 512-byte buffer and be certain that any
message it is intended to pass will fit in it -- if it doesn't, the
server is free to truncate it.
Yes, it does get ugly. The "best" reliable method I can come up with
is to convert to the desired character set, truncate, convert back to
UTF-8 with slop truncation, determine what part of the string was
successfully converted and pinned into the space available, send that,
and repeat on the remaining string. It's just not worth it. I will
probably enforce the length of the UTF-8 string, and assume that
localized strings are <=3D the length of the UTF-8 string ... which may
be invalid, but should work in many (if not most) encodings. (note
that certain situations (e.g. ASCII text encoded into UTF-16) are
guaranteed to fail)
> And I guess if you want to split on word boundaries it does become a bit
> more complicated... hm. If you take it to its extreme conclusion it's
> unsolvable because of languages like Thai, which lacks spaces but also
> don't allow breaks in the middle of words. But somehow I think this is
> thinking about the problem too hard. :)
Many languages fall into that category... Any breaking done will be
on a (hopefully logical) character level, not word boundaries.
To surrender one's personal weapon is to invite disaster. This has
been obvious for so long and so often that there is probably a Greek
word for the practice.
-- Jeff Cooper