From: Yamagata Y. <yor...@mb...> - 2004-05-29 10:54:26
|
From: Richard Jones <ri...@an...> Subject: Re: [Ocaml-lib-devel] Some (simple) functions I'd like to see in ExtLib ... Date: Fri, 28 May 2004 11:12:38 +0100 > I was thinking more of multibyte characters. It seems to me that > UTF-8 is going to be the only thing that matters in the future (aside > from the Microsoft world, of course, where they have as usual gone off > in their own strange direction). I have two points to say. 1. About Unicode domination. I'm still trapped in EUC-JP. There are many glitch, fear and politics preventing migration to UTF-8. (For example, full-width/half-width issue.) I predict EUC-JP, Big-5, KOI-8 and so on continue to exist into the foreseeable future. 2. About UTF-8 domination UTF-8 is not very safe way to handle Unicode string. For example, UTF-8 has a malformed character problem, which would cause a cross site scripting attack, malfunction of IDS, and so on. The problem is that the same character could have several representations. Unicode standard requires the shortest representations, but many implementations allow longer rep. and would be confused by them. I hope that OCaml will wrap UTF-8 string into the abstract type different from string, and treats the access to the bytes as an unsafe operation. The type system is not, in my opinion, only to prevent the crash. By the way, I18N of the free software is often not so advanced (*1), so I guess we have to learn a lot of things from Microsoft. (*1) For example, the input methods of X windows are crap. They keep to open the popup windows in the most creative places (*2), and of course, crash in the end of their action. (*2) This seems the problem of XIM protocol, the protocol for input methods in X. It seems that XIM does not notify the change of the desktop to the input method, so the input method opens their popup window to the old desktop. (*3) (*3) Disclaimer: This is based solely on my experience, not code analysis. -- Yamagata Yoriyuki |