From: Tom E. <tre...@gm...> - 2014-05-12 13:47:17
|
On Sun, May 11, 2014 at 7:23 PM, Krzysztof Drewniak <krz...@gm...> wrote: > You're basically right on what SB-UNICODE (the package) is meant to be. > I think that many of the Unicode algorithms have reasonable behavior on > U+0000 to U+00FF. *All* Unicode algorithms will give reasonable behavior for U+0000 and U+00FF. However, can you guarantee that a build without Unicode support turned on will be using a character set that can safely be interpreted as Latin-1? Or that the user's encoding is CP1252 which puts characters in C1? Or that it isn't TIS-620/ISO-8859-11/CP874 (three almost but not entirely equal Thai encodings.) My point is that the algorithms will work for [U+0000,U+00FF] but will always interpret the characters as they are defined in Latin-1. Nothing we can do about it, but it is something that will need to be noted in the documentation. > I'll probably end up keeping the database and other low-level stuff in > SB-IMPL, and have SB-UNICODE export nice wrapper functions around the DB > (say SB-UNICODE:GENERAL-CATEGORY and things like that). +1 > Also, if I need more Unicode properties, like the full case mappings or > the Quick_Check properties, should I try to modify the existing database > or add another one? Personally I think there should be one database. I agree with Christoph's views on this in his reply. Peace, -tree -- Tom Emerson tre...@gm... http://www.dreamersrealm.net/tree |