From: Dima V. <dv...@ro...> - 2004-04-07 19:40:30
|
Hi! There was a report that xojpanel and 'ptal-hp display' do not show cyrillic text correctly. They output raw data from the device, in this case in iso8859-5, and no convertion to the system's locale is performed. What's the best way to add such recoding to xojpanel? Is it possible to retrieve the language the device is using for LCD? And is there any information on whether all HP devices use exclusively iso8859-5 for cyrillic or not? Thanks in advance. -- Dmitry Vukolov |
From: Joe P. <joe...@sn...> - 2004-04-09 00:51:40
|
On Wednesday 07 April 2004 03:35, Dima Vukolov wrote: > Hi! > > There was a report that xojpanel and 'ptal-hp display' do not show cyrillic > text correctly. Can you supply more info about the issue? Which peripherals (that you know of) exhibit the problem? Do the cyrillic characters display correctly on the device's built-in LCD screen (under Linux)? Are only a few characters displayed incorrectly? When characters are not displayed correctly, do other characters appear in their place, or do they simply not appear? > They output raw data from the device, in this case in > iso8859-5, and no convertion to the system's locale is performed. By default, xojpanel does some simple ASCII character-for-character translations, but the feature is hard-coded. It can be turned off, however, using the '-notrans' option. Have you tried that? The feature was added because some peripherals display 'special' characters on their LCD that are not part of any standard charset, and could not be easily displayed by xojpanel or 'ptal-hp display'. In case the problem is font-related, you could try rebuilding xojpanel with a different default font specified. It's currently set to "Courier". > What's the best way to add such recoding to xojpanel? Unfortunately, I'm not very familiar with internationalization issues. A little reading turned up the lesstif/Motif/openMotif function "XmStringCreateLocalized(text)", which at first glance looks like it might be possible to use it as a conversion wrapper for the strings 'Line1' and 'Line2' in xojpanel.cpp, but I'm really just guessing at this point. If I were to try making xojpanel locale-aware, I wouldn't be able to test for correct cyrillic display, due to my not having a printer available that outputs that charset. > Is it possible to > retrieve the language the device is using for LCD? > And is there any > information on whether all HP devices use exclusively iso8859-5 for > cyrillic or not? I don't know. Is there any other encoding that could be used to display cyrillic? If iso8859-5 contains all of the needed characters for russian and russian-related languages, it probably would be the one used. The HP people *should* be able to answer this for you, but they haven't been around much lately. -- Joe Piolunek |
From: Dmitry V. <dv...@ro...> - 2004-04-10 00:02:47
|
Hi Joe, Joe Piolunek wrote: > Can you supply more info about the issue? Which peripherals (that you > know of) exhibit the problem? Do the Cyrillic characters display > correctly on the device's built-in LCD screen (under Linux)? Are only a > few characters displayed incorrectly? When characters are not displayed > correctly, do other characters appear in their place, or do they simply > not appear? Well, this problem was reported on a Russian-speaking mailing list. The guy used xojpanel to view the LCD of a Color LaserJet 4550. All Latin characters are displayed properly, Cyrillic ones -- not. I'm sure the device itself has no such problems and the problem lies elsewhere. It is also neither font related nor caused by the built-in translation of special characters. > Unfortunately, I'm not very familiar with internationalization issues. A > little reading turned up the lesstif/Motif/openMotif function > "XmStringCreateLocalized(text)", which at first glance looks like it might > be possible to use it as a conversion wrapper for the strings 'Line1' and > 'Line2' in xojpanel.cpp, but I'm really just guessing at this point. Unicode QStrings Line1 and Line2 currently get their values from the raw string obtained from the device (with special symbols translated). From the output of 'ptal-hp display' we have found out that ISO8859-5 is used for Cyrillic text in the device. Now what we need is to explicitly specify the encoding of the text received from the device so that is will be converted to Unicode correctly. In this regard xojpanel.cpp could be changed as follows: QTextCodec *codec = QTextCodec::codecForName(deviceEncoding); // in our case deviceEncoding == "iso8859-5" Line1 = codec->toUnicode(tmpString.remove( length - spaces, spaces )); I suggest adding an extra command line option to xojpanel like -deviceenc <encoding> to set the value of deviceEncoding manually. The default would be iso8859-1 then. Of course it would be nice to be able to retrieve the encoding from the device itself based on some kind of PML object that is set to a language it's using for the LCD. But for now that's just dreams :-) Anyway, even in the proposed solution there is a catch related to those special characters that HP uses for arrows and such. The built-in translation maps them to e.g. '>>' that simply doesn't exist in iso8859-5. Therefore, the above mentioned convertion from iso8859-5 to Unicode would corrupt the arrows and show something weird instead of them. I'll try to come up with something to overcome this. > Is there any other encoding that could be used to display > Cyrillic? If iso8859-5 contains all of the needed characters for Russian > and Russian-related languages, it probably would be the one used. KOI8-R and CP1251 are much more common. And though ISO8859-5 is called an official standard, it's almost never used :-) Yes, and one more thing not really connected with encodings. Messing with xojpanel's source I found out that my PSC 2110 uses a special character 0x15 for the right arrow. Previously xojpanel showed no right arrows for this device at all, though they were present on the LCD. Adding 0x15 to the translation map fixed that. -- Dmitry Vukolov |
From: Joe P. <joe...@sn...> - 2004-04-11 16:38:27
|
On Friday 09 April 2004 08:02, Dmitry Vukolov wrote: Dmitry: I apologize for mispelling your name. -- Joe |
From: Dmitry V. <dv...@ro...> - 2004-04-10 15:27:15
|
Hi Joe, On Friday 09 April 2004 04:51, Joe Piolunek wrote: > Can you supply more info about the issue? Which peripherals (that > you know of) exhibit the problem? Do the cyrillic characters display > correctly on the device's built-in LCD screen (under Linux)? Are only > a few characters displayed incorrectly? When characters are not > displayed correctly, do other characters appear in their place, or do=20 > they simply not appear? Well, this problem was reported on a Russian-speaking mailing list. The=20 guy used xojpanel to view the LCD of a Color LaserJet 4550. All Latin characters are displayed properly, Cyrillic ones -- not. I'm sure the=20 device itself has no such problems and the problem lies elsewhere. It=20 is also neither font related nor caused by the built-in translation of=20 special characters. > Unfortunately, I'm not very familiar with internationalization issues. > A little reading turned up the lesstif/Motif/openMotif function > "XmStringCreateLocalized(text)", which at first glance looks like it > might be possible to use it as a conversion wrapper for the strings > 'Line1' and 'Line2' in xojpanel.cpp, but I'm really just guessing at > this point.=20 Unicode QStrings Line1 and Line2 currently get their values from the raw string obtained from the device (with special symbols translated). From=20 the output of 'ptal-hp display' we have found out that ISO8859-5 is=20 used for Cyrillic text in the device. Now what we need is to explicitly=20 specify the encoding of the text received from the device so that it=20 will be converted to Unicode correctly. In this regard xojpanel.cpp=20 could be changed as follows: QTextCodec *codec =3D QTextCodec::codecForName(deviceEncoding); // in our case deviceEncoding =3D=3D "iso8859-5" Line1 =3D codec->toUnicode(tmpString.remove( length - spaces, spaces )); I suggest adding an extra command line option to xojpanel like=20 =2Ddeviceenc <encoding> to set the value of deviceEncoding manually. The default would be iso8859-1 then. Of course it would be nice to be able=20 to retrieve the encoding from the device itself based on some kind of=20 PML object that is set to a language it's using for the LCD. But for=20 now that's just dreams :-) Anyway, even in the proposed solution there is a catch related to those special characters that HP uses for arrows and such. The built-in translation maps them to e.g. '>>' that simply doesn't exist in=20 iso8859-5. Therefore, the above mentioned convertion from iso8859-5 to=20 Unicode would corrupt the arrows and show something weird instead of=20 them. I'll try to come up with something to overcome this. > Is there any other encoding that could be used to display cyrillic? > If iso8859-5 contains all of the needed characters for russian and > russian-related languages, it probably would be the one used. KOI8-R and CP1251 are much more common. And though ISO8859-5 is called=20 an official standard, it's almost never used :-) Yes, and one more thing not really connected with encodings. Messing=20 with xojpanel's source I found out that my PSC 2110 uses a special=20 character 0x15 for the right arrow. Previously xojpanel showed no right=20 arrows for this device at all, though they were present on the LCD.=20 Adding 0x15 to the translation map fixed that. =2D-=20 Dmitry Vukolov |
From: Joe P. <joe...@sn...> - 2004-04-11 16:26:43
|
On Friday 09 April 2004 08:02, Dmitry Vukolov wrote: <...> > Anyway, even in the proposed solution there is a catch related to those > special characters that HP uses for arrows and such. The built-in > translation maps them to e.g. '>>' that simply doesn't exist in iso8859-5. > Therefore, the above mentioned convertion from iso8859-5 to Unicode would > corrupt the arrows and show something weird instead of them. I'll try to > come up with something to overcome this. Dmitri: Thanks for sending the patch. I didn't notice any problems here after applying it, but I can't fully test it due to my not having a device that uses "special characters". At least I haven't seen my OfficeJet 600 display any of them. Comparing the iso8859-5 chart I found(if it's correct)with Latin1 shows several of the "special" characters missing. By declaring characterTranslationMap and characterMap as type wchar_t instead of 'unsigned char', I had some success doing the character substitutions using the characters' unicode designation. It would only work though, if the needed characters are available on the user's system. For example, a section something like this in XojPanel::buildCharacterMap could be added for each new region where a native user is willing to suggest substitute characters. // Due to differences in the encodings, some // characters need to be remapped for iso8859-5. if (qstrcmp (deviceEncoding, "ISO8859-5") == 0) { characterMap[0x10] = 0x3c; // '<' characterMap[0x80] = 0x3c; // '<' characterMap[0xa0] = 0x3c; // '<' characterMap[0x11] = 0x3e; // '>' characterMap[0x13] = 0x3e; // '>' characterMap[0x15] = 0x3e; // '>' characterMap[0x81] = 0x3e; // '>' characterMap[0x12] = 0x00a3; // unicode "British Pound" sign // Some tests: // remaps 'A' to British Pound sign. //characterMap[0x41] = 0x00a3; // remaps 'A' to one of the Cyrillic chars. // characterMap[0x41] = 0x04c1; // (test) remaps 'A' to one of the Katakana chars. // characterMap[0x41] = 0x30b7; All of the tests above worked for me. It could be possible to use unicode for all of the character substitutions, allowing special remaps for (hopefully just a few) different regions. What I've described sounds a little too easy. Do you know of any problems with doing the remapping using unicode designations? I don't know which would be better - having the end-user specify the encoding ("xojpanel -devenc ISO8859-5"), or regional charset ("xojpanel -cyrillic"). What's your opinion on this? Thanks. -- Joe |
From: Dmitry V. <dv...@ro...> - 2004-04-12 18:17:26
Attachments:
hpoj-0.91-xojpanel-enc.patch
|
Hi Joe, Joe Piolunek wrote: > Comparing the iso8859-5 chart I found(if it's correct)with Latin1 shows > several of the "special" characters missing. By declaring > characterTranslationMap and characterMap as type wchar_t instead of > 'unsigned char', I had some success doing the character substitutions > using the characters' unicode designation. It would only work though, if > the needed characters are available on the user's system. > > For example, a section something like this in XojPanel::buildCharacterMap > could be added for each new region where a native user is willing to > suggest substitute characters. > > // Due to differences in the encodings, some > // characters need to be remapped for iso8859-5. > if (qstrcmp (deviceEncoding, "ISO8859-5") == 0) { > characterMap[0x10] = 0x3c; // '<' <skipped> > // (test) remaps 'A' to one of the Katakana chars. > // characterMap[0x41] = 0x30b7; > > All of the tests above worked for me. It could be possible to use unicode > for all of the character substitutions, allowing special remaps for > (hopefully just a few) different regions. > > What I've described sounds a little too easy. Do you know of any problems > with doing the remapping using unicode designations? Can you describe how exactly did you implement translation of special characters _and_ encoding convertion? QChar(characterTranslationMap[ (unsigned char)string[i] ]) in the patch I sent you does that job pretty well, converting special chars to unicode from latin1 rather that from e.g. iso8859-5. And having '>>' and such in the resulting unicode string doesn't pose a problem at all. Mapping special chars from a wchar_t characterTranslationMap, which content is dependent on the selected deviceEncoding, seems a bit too much to me. I'm reposting the patch to the mailing list for anyone else who may be interested in it. I works without a hitch for iso8859-1 and iso8859-5 showing all special characters correctly. What bothers me is that the current convertion procedure will probably not work for multibyte encodings if HP devices ever make use of them. I would appreciate any information and ideas about that. > I don't know which would be better - having the end-user specify the > encoding ("xojpanel -devenc ISO8859-5"), or regional charset ("xojpanel > -cyrillic"). What's your opinion on this? The original idea was that -devenc could be useful not only with cyrillic but other non-latin charsets as well. We might make -cyrillic (or -charset cyrillic) an alias for -devenc ISO8859-5 for convenience reasons assuming that it's the only encoding used by HP for that purpose. -- Dmitry Vukolov |
From: Joe P. <joe...@sn...> - 2004-04-13 14:57:56
|
On Monday 12 April 2004 02:16, Dmitry Vukolov wrote: > Hi Joe, <...> > I'm reposting the patch to the mailing list for anyone else who may be > interested in it. I works without a hitch for iso8859-1 and iso8859-5 > showing all special characters correctly. OK. I didn't understand that before. You didn't make it clear. If your patch fixes all of the problems you were seeing, then the changes I was suggesting would not be useful. I would like to see others try your patch and report their results. I've put up a web page containing your patch for download at http://pages.cthome.net/jsp/prj/xojpanel/xojpanel.html I've been away from coding (and from xojpanel) for a while, so it will take me some time to relearn a bit and to try to understand what your patch will do, since I can not see its results on my hardware. > What bothers me is that the current convertion procedure will probably not > work for multibyte encodings if HP devices ever make use of them. I would > appreciate any information and ideas about that. It seems likely that xojpanel would need some changes. Using QChar for the character maps would probably be one of them. I don't know if libptal, which xojpanel uses to retrieve the peripherals' LCD display strings, can handle multibyte characters. If it cannot, it would need to be hacked. Doing that might be beyond my capability. HP is showing little evidence of intention to continue hpoj development. I can think of a couple of ways that HP could use unicode in future LCD displays: 1. Unicode sent to the PC, with software doing character (or even string) translations based on a locale setting, then feeding back the translations to the LCD display unit. I think this would only be used when the device is not expected to be used in "stand-alone" fashion. 2. Unicode used (or just stored) internally, but making only a subset visible to the user, with some way to configure the charset before or (maybe through software) after delivery. This would be more suitable for use in stand-alone devices. If HP wants to be generous, it could allow the user to configure the locale directly on the peripheral. These are just guesses, though. It would really be helpful if HP provides answers. > The original idea was that -devenc could be useful not only with cyrillic > but other non-latin charsets as well. We might make -cyrillic (or -charset > cyrillic) an alias for -devenc ISO8859-5 for convenience reasons assuming > that it's the only encoding used by HP for that purpose. To keep things simple, it might be better to use only '-devenc', as you suggested at first. -- Joe |