From: Dmitry V. <dv...@ro...> - 2004-04-12 18:17:26
|
Hi Joe, Joe Piolunek wrote: > Comparing the iso8859-5 chart I found(if it's correct)with Latin1 shows > several of the "special" characters missing. By declaring > characterTranslationMap and characterMap as type wchar_t instead of > 'unsigned char', I had some success doing the character substitutions > using the characters' unicode designation. It would only work though, if > the needed characters are available on the user's system. > > For example, a section something like this in XojPanel::buildCharacterMap > could be added for each new region where a native user is willing to > suggest substitute characters. > > // Due to differences in the encodings, some > // characters need to be remapped for iso8859-5. > if (qstrcmp (deviceEncoding, "ISO8859-5") == 0) { > characterMap[0x10] = 0x3c; // '<' <skipped> > // (test) remaps 'A' to one of the Katakana chars. > // characterMap[0x41] = 0x30b7; > > All of the tests above worked for me. It could be possible to use unicode > for all of the character substitutions, allowing special remaps for > (hopefully just a few) different regions. > > What I've described sounds a little too easy. Do you know of any problems > with doing the remapping using unicode designations? Can you describe how exactly did you implement translation of special characters _and_ encoding convertion? QChar(characterTranslationMap[ (unsigned char)string[i] ]) in the patch I sent you does that job pretty well, converting special chars to unicode from latin1 rather that from e.g. iso8859-5. And having '>>' and such in the resulting unicode string doesn't pose a problem at all. Mapping special chars from a wchar_t characterTranslationMap, which content is dependent on the selected deviceEncoding, seems a bit too much to me. I'm reposting the patch to the mailing list for anyone else who may be interested in it. I works without a hitch for iso8859-1 and iso8859-5 showing all special characters correctly. What bothers me is that the current convertion procedure will probably not work for multibyte encodings if HP devices ever make use of them. I would appreciate any information and ideas about that. > I don't know which would be better - having the end-user specify the > encoding ("xojpanel -devenc ISO8859-5"), or regional charset ("xojpanel > -cyrillic"). What's your opinion on this? The original idea was that -devenc could be useful not only with cyrillic but other non-latin charsets as well. We might make -cyrillic (or -charset cyrillic) an alias for -devenc ISO8859-5 for convenience reasons assuming that it's the only encoding used by HP for that purpose. -- Dmitry Vukolov |