Re: [Afpfs-ng-devel] precompose and decompose
Status: Alpha
Brought to you by:
alexthepuffin
From: HAT <ha...@fa...> - 2008-03-31 15:13:02
|
Hmm... It is difficult for me to explain this problem because I am not good at English. Mac OS 8 Unicode 1.x Mac OS 9 - X 10.1 Unicode 2.x Mac OS X 10.2 - 10.4 Unicode 3.2 Mac OS X 10.5 Unicode 4 (compose-table is same as 3.2) When MacOS is upgraded from old version to newer version, the installer check the filesystem and rewrite filenames. Change Unicode 1.x -> 2.x ----------------------------------------- Hangul is re-defined. (imcompatible) When upgrading MacOS8 to MacOS9/10.0/10.1, Hangul code is changed. Change Unicode 2.x -> 3.x ----------------------------------------- Composition of Unicode 2.x is buggy. Unicode 3.x defines Canonical composision, Canonical ordering and Singleton. Unicode 3.x has "upper-compatibility" with Unicode 2.x by canonical normalization. When upgrading MacOSX10.0/10.1 to 10.2, the following filenames is decomposed. {0x0001D15E, 0x0001D1570001D165}, /* MUSICAL SYMBOL HALF NOTE */ {0x0001D15F, 0x0001D1580001D165}, /* MUSICAL SYMBOL QUARTER NOTE */ {0x0001D160, 0x0001D15F0001D16E}, /* MUSICAL SYMBOL EIGHTH NOTE */ {0x0001D161, 0x0001D15F0001D16F}, /* MUSICAL SYMBOL SIXTEENTH NOTE */ {0x0001D162, 0x0001D15F0001D170}, /* MUSICAL SYMBOL THIRTY-SECOND NOTE */ {0x0001D163, 0x0001D15F0001D171}, /* MUSICAL SYMBOL SIXTY-FOURTH NOTE */ {0x0001D164, 0x0001D15F0001D172}, /* MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE */ {0x0001D1BB, 0x0001D1B90001D165}, /* MUSICAL SYMBOL MINIMA */ {0x0001D1BC, 0x0001D1BA0001D165}, /* MUSICAL SYMBOL MINIMA BLACK */ {0x0001D1BD, 0x0001D1BB0001D16E}, /* MUSICAL SYMBOL SEMIMINIMA WHITE */ {0x0001D1BF, 0x0001D1BB0001D16F}, /* MUSICAL SYMBOL FUSA WHITE */ {0x0001D1BE, 0x0001D1BC0001D16E}, /* MUSICAL SYMBOL SEMIMINIMA BLACK */ {0x0001D1C0, 0x0001D1BC0001D16F}, /* MUSICAL SYMBOL FUSA BLACK */ There are some changes, too. Change Unicode 3.x -> 4.x ----------------------------------------- Composition is same. Change Unicode 4.x -> 5.0 ----------------------------------------- added the following table. {0x00001B06, 0x00001B0500001B35}, /* BALINESE LETTER AKARA TEDUNG */ {0x00001B08, 0x00001B0700001B35}, /* BALINESE LETTER IKARA TEDUNG */ {0x00001B0A, 0x00001B0900001B35}, /* BALINESE LETTER UKARA TEDUNG */ {0x00001B0C, 0x00001B0B00001B35}, /* BALINESE LETTER RA REPA TEDUNG */ {0x00001B0E, 0x00001B0D00001B35}, /* BALINESE LETTER LA LENGA TEDUNG */ {0x00001B12, 0x00001B1100001B35}, /* BALINESE LETTER OKARA TEDUNG */ {0x00001B3B, 0x00001B3A00001B35}, /* BALINESE VOWEL SIGN RA REPA TEDUNG */ {0x00001B3D, 0x00001B3C00001B35}, /* BALINESE VOWEL SIGN LA LENGA TEDUNG */ {0x00001B40, 0x00001B3E00001B35}, /* BALINESE VOWEL SIGN TALING TEDUNG */ {0x00001B41, 0x00001B3F00001B35}, /* BALINESE VOWEL SIGN TALING REPA TEDUNG */ {0x00001B43, 0x00001B4200001B35}, /* BALINESE VOWEL SIGN PEPET TEDUNG */ -------------------------------------------------------------------------- I tested all of characters about MacOSX 10.1/10.2/10.4 before. These strictly observe the Unicode Standard. >> Do the tables need to be AFP version specific? It is not compatible between Unicode 1.x and 2.x. Mac OS 8 is based on Unicode 1.x. It's no problem because AFP2 don't use Unicode. Unicode 2.x and later have upper-compatibility. Therefore, newest Unicode should be used. >well, from my understanding the decomposition table does not depend on >the version of AFP but on the _filesystem_ used by the server OS. >According to document tn1150 the decomposition table (tn1150table) is >specified for HFS plus which was introduced with MAC OS 8.1. Never trust Apple's documentation. Try to check your machine. >tn1150 further states under "Unicode subtleties" that: > >-------------------------- > >IMPORTANT: >An implementation must not use the Unicode utilities implemented by its >native platform (for decomposition >and comparison), unless those algorithms are equivalent to the HFS Plus >algorithms defined here, and are >guaranteed to be so forever. This is rarely the case. Platform >algorithms tend to evolve with the Unicode >standard. The HFS Plus algorithms cannot evolve because such evolution >would invalidate existing HFS Plus >volumes. > >-------------------------- Do not believe this. Apple doesn't say the lie. Because the documents have no been renewed, it is not suitable for the current state. Apple adopts the latest Unicode. However, U2000 to U2FFF, UFE30 to UFE4F, and U2F800 to U2FA1F are not decomposed. It is for compatibilty. >Given that HFS Plus is still the preferred filesystem for OSX (even for >the latest version - please correct me if I'm wrong here) the >decomposition table _may_ not be changed for the reasons given above. If >the encoding of HFS Plus names would change with every release of OSX >the volumes would no longer be compatible. > >> On 30-Mar-08, at 12:59 PM, HAT wrote: >> >>> Hi. >>> >>>> The resulting table was further checked against: >>>> http://developer.apple.com/technotes/tn/tn1150table.html >>>> >>>> and decompositions not marked "illegal" in tn1150table were removed fr >om >>>> the result. >>> >>> The tn1150table is based on Unicode 2.x (Mac OS X 10.1 and earlier). >>> Unicode 2.x has no concept of "ordering". >>> Mac OS X 10.2 and later are based on Unicode 3.2. >>> Recent Leopard is based on Unicode 4. >>> The character has increased. >>> Maybe, Mac OS X 10.6 will be based on Unicode 5.x. >>> I think that afpfs-ng should use newest Unicode. >>> At present, latest version is 5.0.0. > >HAT, please take a look at the arguments given above. IMHO the >conversion table is (and _must_ stay) independent of any changes in >Unicode support for the different OSX releases. > >>>> Please let me know, if there is a newer version of the unicode >>>> decomposition table for HFS Plus. >>> >>> Yes, newest version is 5.0.0. >>> However, it's not necesary to compare it with the tn1150table. >>> >>>> Your suggestion (from your first message) to change the internal >>>> character representation in lib/unicode.c from UCS2 to UCS4 absolutely >>>> makes sense >>> >>> Ok. >>> >>>> although there are currently no decompositions in the range >>>>> U+010000 used by HFS Plus that I'm aware of. >>> >>> Mac OS 10.5.2 Leopard support the following decomposition. >>> >>> 0x1D15E, 0x1D157 0x1D165 MUSICAL SYMBOL HALF NOTE >>> 0x1D15F, 0x1D158 0x1D165 MUSICAL SYMBOL QUARTER NOTE >>> 0x1D160, 0x1D15F 0x1D16E MUSICAL SYMBOL EIGHTH NOTE >>> 0x1D161, 0x1D15F 0x1D16F MUSICAL SYMBOL SIXTEENTH NOTE >>> 0x1D162, 0x1D15F 0x1D170 MUSICAL SYMBOL THIRTY-SECOND NOTE >>> 0x1D163, 0x1D15F 0x1D171 MUSICAL SYMBOL SIXTY-FOURTH NOTE >>> 0x1D164, 0x1D15F 0x1D172 MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE >>> 0x1D1BB, 0x1D1B9 0x1D165 MUSICAL SYMBOL MINIMA >>> 0x1D1BC, 0x1D1BA 0x1D165 MUSICAL SYMBOL MINIMA BLACK >>> 0x1D1BD, 0x1D1BB 0x1D16E MUSICAL SYMBOL SEMIMINIMA WHITE >>> 0x1D1BF, 0x1D1BB 0x1D16F MUSICAL SYMBOL FUSA WHITE >>> 0x1D1BE, 0x1D1BC 0x1D16E MUSICAL SYMBOL SEMIMINIMA BLACK >>> 0x1D1C0, 0x1D1BC 0x1D16F MUSICAL SYMBOL FUSA BLACK > >Where do you have this information from? Try to check your HFS+. >>> At present, the glyphs of these characters do not exist. >>> However, I think that it is necessary to support for the future. >>> >>>> Will you be able (and willing ... ;-) ) to do the rewrite? I might do >>>> it as well, but am not sure when I will find the time to actually do s >o. >>> >>> It's possible. >>> But I do not understand all of source. >>> Is it the following files that use UCS2 as internal code? >>> >>> codepage.c >>> unicode.c >>> unicode.h >>> >>>> If you have a patch, I will help with the testing, though. >>> >>> First of all, I wrote a sample header file. >>> This is based on Unicode 5.0.0. > >Yeah, it looks good, but it again contains all the decompositions, which >were deleted from our current table because they were not in tn1150table. > >Let's try to find a consensus on what should be in the table before >going on with the code here. > >--------- > >Another question to HAT: > >Reading tn1150 page 34 I found the following sentence: > >In addition, the Korean Hangul characters with codes in the range u+AC00 >through u+D7A3 are illegal and must be replaced with the >equivalent sequence of conjoining jamos, as described in the Unicode 2.0 >book, section 3.10. > >Probably we should add these conversions to make Korean Hangul work - >what do you think? It's not necessary This change is from Unicode 1.x to 2.x. Korean Mac OS 8/9 use not Unicode 1.x but MacKorean via AFP2. Mac OS X is based on 2.x and later via AFP3. Summary. Unicode 3.2 (same as 4.0) is needed for Mac OS X 10.2 -10.5. Unicode 5 will be needed for future Mac OS X maybe. Canonical ordering is needed for Mac OS X 10.0 - 10.1. Singleton is needed??? We should discuss it. -- HAT |