Re: [Afpfs-ng-devel] precompose and decompose
Status: Alpha
Brought to you by:
alexthepuffin
From: Michael U. <mu...@re...> - 2008-03-30 21:35:07
|
Alex deVries wrote: > Of course, I don't understand enough about internationalization to > follow all of this. > > Do the tables need to be AFP version specific? Alex, HAT, well, from my understanding the decomposition table does not depend on the version of AFP but on the _filesystem_ used by the server OS. According to document tn1150 the decomposition table (tn1150table) is specified for HFS plus which was introduced with MAC OS 8.1. tn1150 further states under "Unicode subtleties" that: -------------------------- IMPORTANT: An implementation must not use the Unicode utilities implemented by its native platform (for decomposition and comparison), unless those algorithms are equivalent to the HFS Plus algorithms defined here, and are guaranteed to be so forever. This is rarely the case. Platform algorithms tend to evolve with the Unicode standard. The HFS Plus algorithms cannot evolve because such evolution would invalidate existing HFS Plus volumes. -------------------------- Given that HFS Plus is still the preferred filesystem for OSX (even for the latest version - please correct me if I'm wrong here) the decomposition table _may_ not be changed for the reasons given above. If the encoding of HFS Plus names would change with every release of OSX the volumes would no longer be compatible. > On 30-Mar-08, at 12:59 PM, HAT wrote: > >> Hi. >> >>> The resulting table was further checked against: >>> http://developer.apple.com/technotes/tn/tn1150table.html >>> >>> and decompositions not marked "illegal" in tn1150table were removed from >>> the result. >> >> The tn1150table is based on Unicode 2.x (Mac OS X 10.1 and earlier). >> Unicode 2.x has no concept of "ordering". >> Mac OS X 10.2 and later are based on Unicode 3.2. >> Recent Leopard is based on Unicode 4. >> The character has increased. >> Maybe, Mac OS X 10.6 will be based on Unicode 5.x. >> I think that afpfs-ng should use newest Unicode. >> At present, latest version is 5.0.0. HAT, please take a look at the arguments given above. IMHO the conversion table is (and _must_ stay) independent of any changes in Unicode support for the different OSX releases. >>> Please let me know, if there is a newer version of the unicode >>> decomposition table for HFS Plus. >> >> Yes, newest version is 5.0.0. >> However, it's not necesary to compare it with the tn1150table. >> >>> Your suggestion (from your first message) to change the internal >>> character representation in lib/unicode.c from UCS2 to UCS4 absolutely >>> makes sense >> >> Ok. >> >>> although there are currently no decompositions in the range >>>> U+010000 used by HFS Plus that I'm aware of. >> >> Mac OS 10.5.2 Leopard support the following decomposition. >> >> 0x1D15E, 0x1D157 0x1D165 MUSICAL SYMBOL HALF NOTE >> 0x1D15F, 0x1D158 0x1D165 MUSICAL SYMBOL QUARTER NOTE >> 0x1D160, 0x1D15F 0x1D16E MUSICAL SYMBOL EIGHTH NOTE >> 0x1D161, 0x1D15F 0x1D16F MUSICAL SYMBOL SIXTEENTH NOTE >> 0x1D162, 0x1D15F 0x1D170 MUSICAL SYMBOL THIRTY-SECOND NOTE >> 0x1D163, 0x1D15F 0x1D171 MUSICAL SYMBOL SIXTY-FOURTH NOTE >> 0x1D164, 0x1D15F 0x1D172 MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE >> 0x1D1BB, 0x1D1B9 0x1D165 MUSICAL SYMBOL MINIMA >> 0x1D1BC, 0x1D1BA 0x1D165 MUSICAL SYMBOL MINIMA BLACK >> 0x1D1BD, 0x1D1BB 0x1D16E MUSICAL SYMBOL SEMIMINIMA WHITE >> 0x1D1BF, 0x1D1BB 0x1D16F MUSICAL SYMBOL FUSA WHITE >> 0x1D1BE, 0x1D1BC 0x1D16E MUSICAL SYMBOL SEMIMINIMA BLACK >> 0x1D1C0, 0x1D1BC 0x1D16F MUSICAL SYMBOL FUSA BLACK Where do you have this information from? >> At present, the glyphs of these characters do not exist. >> However, I think that it is necessary to support for the future. >> >>> Will you be able (and willing ... ;-) ) to do the rewrite? I might do >>> it as well, but am not sure when I will find the time to actually do so. >> >> It's possible. >> But I do not understand all of source. >> Is it the following files that use UCS2 as internal code? >> >> codepage.c >> unicode.c >> unicode.h >> >>> If you have a patch, I will help with the testing, though. >> >> First of all, I wrote a sample header file. >> This is based on Unicode 5.0.0. Yeah, it looks good, but it again contains all the decompositions, which were deleted from our current table because they were not in tn1150table. Let's try to find a consensus on what should be in the table before going on with the code here. --------- Another question to HAT: Reading tn1150 page 34 I found the following sentence: In addition, the Korean Hangul characters with codes in the range u+AC00 through u+D7A3 are illegal and must be replaced with the equivalent sequence of conjoining jamos, as described in the Unicode 2.0 book, section 3.10. Probably we should add these conversions to make Korean Hangul work - what do you think? >> --HAT Thanks + Best regards ... Michael |