[q-lang-users] Unicode
Brought to you by:
agraef
From: John C. <co...@cc...> - 2005-04-16 16:33:24
|
Albert Graef scripsit: > Interested in retrofitting unicode support to some strange obscure > functional programming language? ;-) I'd be interested in helping you do it, and maybe peeking at some source code here and there. Overall, I'd say you're 90% of the way thanks to two decisions about typing: 1) You don't have a character type in Q; 2) You already distinguish firmly between strings and byte vectors. Not having a character type in Q means that you don't have to break any assumptions about how big a character can be: in Unicode there are 0x11000 different potential characters (most of them unassigned), not 128 or 256. I recommend that Q strings use the UTF-8 encoding internally. The UTF-8 encoding uses 1, 2, 3, or 4 bytes to encode each character depending on the numerical equivalent of the character. In particular, the ASCII subset uses a 1-byte representation, the same as ASCII itself, and the bytes 0x00 through 0x7F are never used for anything else. The Latin-1 subset, however, requires a 2-byte representation. There will be five places where Unicode has to be addressed: in pulling substrings out of strings, in reading, in writing, in converting from strings to byte strings, and in converting from byte strings to strings. In the last two cases, it is desirable (but not necessary) to provide a method of overriding the system standard external encoding such as Latin-1 which is generated or interpreted respectively. The iconv_open(), iconv(), and iconv_close() functions do the donkey work of conversion. If they are not available on a system, the GNU iconv library provides a good implementation. It's distributed under the Lesser GPL, so it will not affect the licensing of Q. > I have this unicode stuff on my TODO list for a _very_ long time, but > somehow I can't wrap my head around it. I understand. I hope the above is somewhat helpful; I'll be happy to answer questions either on this list or privately. -- There is / One art John Cowan <co...@cc...> No more / No less http://www.reutershealth.com To do / All things http://www.ccil.org/~cowan With art- / Lessness -- Piet Hein |