From: Yves A. <yv...@re...> - 2001-07-31 08:58:13
|
OK, here I am again with this. I did a quick experiment: 1. Define a $(unidatadir) to use in the Makefiles to refer to icu/data/unidata. Define it in icudefs.mk.in. 2. Create an empty unidata-2.1, use that as the value for unidatadir, and then put into it UnicodeData.txt and SpecialCasing.txt, from ftp://ftp.unicode.org/Public/2.1-Update4. 3. Build, and add files as needed by cheating and getting them from ICU's unidata directory. Here are some notes I took while doing that, adding one file at a time each time the build failed. Help on each item is helpful. QuickCheck.txt It says "generated from NormalizationQuickCheck.txt" and the latter is not from the UNIDATA directory of ftp.unicode.org. How was that generated? FCDCheck.txt From UCA ... What is the data source and the tool? Mirror.txt Very similar to BidiMirroring.txt, not really the same format, haven't tried to feed one instead of the other to the ICU tools. Maybe possible? Any light on that? CaseFolding.txt In UNIDATA but in any 2.0-Update. DerivedNormalizedProperties.txt Generated from a given version of UnicodeData.txt. Then why don't we have it in the 2.1-Update directories? Where can I get one for 2.1? At this point, the build fails with: Creating unorm.dat gennorm: error - length of NFD(U+01e0) = 3 >2 in UnicodeData - illegal which is likely to be caused by an inconsistency between all these 2.1 and 3.0 data files. And I haven't even reached the point where I'd need to generate and then use the UCA rules yet! I then made a new experimental directory with everything from unidata and a 2.1.9 UnicodeData.txt file overwriting the 3.0 one. Same error (as expected). So it shows that retrofitting Unicode 2.1 in ICU 1.8.1 may not be that simple. Still, I believe doing so has value for interoperability with Microsoft Windows. If I can get help in generating the necessary data files I'll give it a good shot in a couple weeks. I could use my own Unicode library to manipulate 2.1 data algorithmically to generate some files (if it's limited to character data properties) or try to find an older version of ICU that wouldn't mind getting 2.1 data (even 1.3.1 has UnicodeData-3.0.0.txt with it!) YA |