From: Edward M. <em...@co...> - 2011-04-19 13:23:05
|
On 19 Apr 2011, at 9:16 AM, William Lachance wrote: > On Tue, Apr 19, 2011 at 8:58 AM, Edward Mendelson <em...@co...> wrote: > Hello, > > Recently I was asked to help someone convert hundreds of WPMac files that include Japanese Kanji, files that were created on old Macs that had the Japanese Language Kits installed. > > It seems - I could be wrong - that libwpd doesn't convert the characters in those files. The method I found for converting them was a bit roundabout: > ... > Use a PowerPC Mac that runs OS 10.4 and "Classic" with the Japanese Language Kit installed. Open the WPMac files in WPMac in Classic. Copy the contents of the file to the clipboard. Paste the contents of the file from the Clipboard into OS X's TextEdit or any other unicode-aware Mac application. Save the resulting file as an RTF or DOC file. The resulting file opens correctly in LibreOffice, Pages, Word, etc. > > This method obviously requires obsolete hardware and software. I would guess that it would require an enormous amount of effort to support double-byte CJK and other WorldScript-based scripts in libwpd, and that the potential need for it is far too small to justify the effort. But is this something that might someday be possible in the future? > > Actually, it's not really that difficult. Unless Japanese is dramatically different from what we've seen so far, all we should need to make this conversion work is a table mapping from WordPerfect extended characters to their unicode equivalents. Over the years we've expanded support for languages from only plain latin to relatively obscure ones like Tibetan courtesy of mappings submitted by various people. > > If you don't have the expertise to create such a mapping yourself, we could probably derive one from (1) a WP document containing all the characters in a Japanese script and (2) one converted to RTF/DOC. If you're interested in producing something like this, let us know! > > -- > William Lachance > wr...@gm... Thanks for that quick reply. I don't have such a document, but I'll ask the person who asked me for help, in the hope that he might be able to create one. Japanese/Chinese/Korean in WPMac formats are I *think* completely different from extended characters in WPDOS/WPWin/WPUnix formats. They don't use the CharacterSet/CharacterNumber system, because thousands of characters are supported in each language. So a document that included the extended characters would be enormous. My guess is that Tibetan fits fairly well into character set 12, but this would be different. Or am I completely wrong about this? Edward Mendelson |