Re: [Afpfs-ng-devel] precompose and decompose
Status: Alpha
Brought to you by:
alexthepuffin
From: Michael U. <mu...@re...> - 2008-04-01 09:45:53
|
HAT wrote: > Hmm... > It is difficult for me to explain this problem because I am not good > at English. > > Mac OS 8 Unicode 1.x > Mac OS 9 - X 10.1 Unicode 2.x > Mac OS X 10.2 - 10.4 Unicode 3.2 > Mac OS X 10.5 Unicode 4 (compose-table is same as 3.2) > > When MacOS is upgraded from old version to newer version, > the installer check the filesystem and rewrite filenames. > > Change Unicode 1.x -> 2.x ----------------------------------------- > Hangul is re-defined. (imcompatible) > When upgrading MacOS8 to MacOS9/10.0/10.1, Hangul code is changed. > > Change Unicode 2.x -> 3.x ----------------------------------------- > Composition of Unicode 2.x is buggy. > Unicode 3.x defines Canonical composision, Canonical ordering and Singleton. > Unicode 3.x has "upper-compatibility" with Unicode 2.x by canonical > normalization. > When upgrading MacOSX10.0/10.1 to 10.2, the following filenames is decomposed. > > {0x0001D15E, 0x0001D1570001D165}, /* MUSICAL SYMBOL HALF NOTE */ > {0x0001D15F, 0x0001D1580001D165}, /* MUSICAL SYMBOL QUARTER NOTE */ > {0x0001D160, 0x0001D15F0001D16E}, /* MUSICAL SYMBOL EIGHTH NOTE */ > {0x0001D161, 0x0001D15F0001D16F}, /* MUSICAL SYMBOL SIXTEENTH NOTE */ > {0x0001D162, 0x0001D15F0001D170}, /* MUSICAL SYMBOL THIRTY-SECOND NOTE */ > {0x0001D163, 0x0001D15F0001D171}, /* MUSICAL SYMBOL SIXTY-FOURTH NOTE */ > {0x0001D164, 0x0001D15F0001D172}, /* MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE */ > {0x0001D1BB, 0x0001D1B90001D165}, /* MUSICAL SYMBOL MINIMA */ > {0x0001D1BC, 0x0001D1BA0001D165}, /* MUSICAL SYMBOL MINIMA BLACK */ > {0x0001D1BD, 0x0001D1BB0001D16E}, /* MUSICAL SYMBOL SEMIMINIMA WHITE */ > {0x0001D1BF, 0x0001D1BB0001D16F}, /* MUSICAL SYMBOL FUSA WHITE */ > {0x0001D1BE, 0x0001D1BC0001D16E}, /* MUSICAL SYMBOL SEMIMINIMA BLACK */ > {0x0001D1C0, 0x0001D1BC0001D16F}, /* MUSICAL SYMBOL FUSA BLACK */ Hi HAT, yes, now I see the light ;-) I've tried this on my MAC according to the example (1D15E) you gave yesterday and it is actually decomposed (1D157, 1D165) when written to the file system. Hence I agree to include these decompositions in the table. > There are some changes, too. > > Change Unicode 3.x -> 4.x ----------------------------------------- > Composition is same. > > Change Unicode 4.x -> 5.0 ----------------------------------------- > added the following table. > > {0x00001B06, 0x00001B0500001B35}, /* BALINESE LETTER AKARA TEDUNG */ > {0x00001B08, 0x00001B0700001B35}, /* BALINESE LETTER IKARA TEDUNG */ > {0x00001B0A, 0x00001B0900001B35}, /* BALINESE LETTER UKARA TEDUNG */ > {0x00001B0C, 0x00001B0B00001B35}, /* BALINESE LETTER RA REPA TEDUNG */ > {0x00001B0E, 0x00001B0D00001B35}, /* BALINESE LETTER LA LENGA TEDUNG */ > {0x00001B12, 0x00001B1100001B35}, /* BALINESE LETTER OKARA TEDUNG */ > {0x00001B3B, 0x00001B3A00001B35}, /* BALINESE VOWEL SIGN RA REPA TEDUNG */ > {0x00001B3D, 0x00001B3C00001B35}, /* BALINESE VOWEL SIGN LA LENGA TEDUNG */ > {0x00001B40, 0x00001B3E00001B35}, /* BALINESE VOWEL SIGN TALING TEDUNG */ > {0x00001B41, 0x00001B3F00001B35}, /* BALINESE VOWEL SIGN TALING REPA TEDUNG */ > {0x00001B43, 0x00001B4200001B35}, /* BALINESE VOWEL SIGN PEPET TEDUNG */ These decompositions are already included in the table. > -------------------------------------------------------------------------- > > I tested all of characters about MacOSX 10.1/10.2/10.4 before. > These strictly observe the Unicode Standard. > >>> Do the tables need to be AFP version specific? > > It is not compatible between Unicode 1.x and 2.x. > Mac OS 8 is based on Unicode 1.x. It's no problem because AFP2 don't > use Unicode. > Unicode 2.x and later have upper-compatibility. > Therefore, newest Unicode should be used. > >> well, from my understanding the decomposition table does not depend on >> the version of AFP but on the _filesystem_ used by the server OS. >> According to document tn1150 the decomposition table (tn1150table) is >> specified for HFS plus which was introduced with MAC OS 8.1. > > Never trust Apple's documentation. > Try to check your machine. > >> tn1150 further states under "Unicode subtleties" that: >> >> -------------------------- >> >> IMPORTANT: >> An implementation must not use the Unicode utilities implemented by its >> native platform (for decomposition >> and comparison), unless those algorithms are equivalent to the HFS Plus >> algorithms defined here, and are >> guaranteed to be so forever. This is rarely the case. Platform >> algorithms tend to evolve with the Unicode >> standard. The HFS Plus algorithms cannot evolve because such evolution >> would invalidate existing HFS Plus >> volumes. >> >> -------------------------- > > Do not believe this. > Apple doesn't say the lie. > Because the documents have no been renewed, it is not suitable for > the current state. > > Apple adopts the latest Unicode. > However, U2000 to U2FFF, UFE30 to UFE4F, and U2F800 to U2FA1F are not > decomposed. It is for compatibilty. There are still some decompositions from the range 2000 - 2FFF listed in our table. I've checked with 2260, 226E, 226F and 219A and they are _not_ decomposed in HFS+ on 10.4. From my POV these entries should be removed from the list. What's your opinion? (I don't oversee, whether keeping them in the list might cause any harm ...) <SNIP> >>>>> Will you be able (and willing ... ;-) ) to do the rewrite? I might do >>>>> it as well, but am not sure when I will find the time to actually do s >> o. >>>> It's possible. >>>> But I do not understand all of source. >>>> Is it the following files that use UCS2 as internal code? >>>> >>>> codepage.c >>>> unicode.c >>>> unicode.h >>>> >>>>> If you have a patch, I will help with the testing, though. >>>> First of all, I wrote a sample header file. >>>> This is based on Unicode 5.0.0. >> Yeah, it looks good, but it again contains all the decompositions, which >> were deleted from our current table because they were not in tn1150table. >> >> Let's try to find a consensus on what should be in the table before >> going on with the code here. So, we probably reached that agreement now ... >> Another question to HAT: >> >> Reading tn1150 page 34 I found the following sentence: >> >> In addition, the Korean Hangul characters with codes in the range u+AC00 >> through u+D7A3 are illegal and must be replaced with the >> equivalent sequence of conjoining jamos, as described in the Unicode 2.0 >> book, section 3.10. >> >> Probably we should add these conversions to make Korean Hangul work - >> what do you think? > > It's not necessary > This change is from Unicode 1.x to 2.x. > Korean Mac OS 8/9 use not Unicode 1.x but MacKorean via AFP2. > Mac OS X is based on 2.x and later via AFP3. Ok. > Summary. > Unicode 3.2 (same as 4.0) is needed for Mac OS X 10.2 -10.5. > Unicode 5 will be needed for future Mac OS X maybe. > Canonical ordering is needed for Mac OS X 10.0 - 10.1. > Singleton is needed??? We should discuss it. Before I start digging up the Unicode standard and try to understand what 'singleton' means, could you please give me a short intro? I will take a look at the rest of the code and come back with a UCS4 based solution. Thanks + Best regards ... Michael |