From: YAMAGATA y. <yor...@ms...> - 2001-11-20 23:44:25
|
From: Gerd Stolpmann <in...@ge...> Subject: Re: [Ocamlnet-devel] an experimental impelmentation of Unicode support. Date: Sun, 18 Nov 2001 23:01:42 +0100 > Okay, but we need some basic data type for the interface between the > protocol layer and the higher layers. Some protocols, say a domain name or URL, are going to require unicode for internationalization. But for most of protocols, encoding * string as you proposed, is sufficient. Or just string * string (a first string for encoding name) would be better, to avoid the case that a message is rejected just because ocamlnet doesn't know its encoding. As for the idea using unicode for everything, I think this has some risk, because there are some character set not currently supported by unicode. For example, JIS0213, the recent Japanese standard, contains a lot of characters not encoded in unicode. (registration is now in progress.) I have heard something similar about Chinese Big-5 encoding. I don't think such extension is widely used, though. In addition, some rare cases, translation to unicode loses information in the original encoded string. For example, iso-2022-jp-2 make distinction between Chinese, Japanese, Korean ideographs, while in unicode, they are unified. This is undesirable behaviour for a protocol layer. Though, again, this usually doesn't cause a problem, because iso-2022-jp-2 is used for mainly Japanese texts. As Patrick said, for many application we don't need to decode encoded strings. So, if there is no particular reason to prefer unicode, just throw decoding task on higher layers. This leaves more choice to a user. > What about the idea to have a basic ustring type that supports both > encodings? This could be modeled with phantom types like in the Bigarray module. I reach the similar conclusion. But I thought about using OOP. Say provide the virtual classes ustorage -- for all ucs string-like data types. only allows the access using cursors. uindexed -- virtual class in which indexing is possible. umutable -- virtual class allowing in place update. and make the hierarchy as ustorage -> uindexed -> umutable | | | -> utf8, -> utext -> ustring utf16 and the similar hierarchy for cursors. (I'm not sure this is possible, though. I don't know well about ocaml object system.) What is the advantage of a phantom type? -- yori |