I have an oorexx script that receives text strings (json strings) encoded in utf8 (peppered with german diacritics) and would like to convert the strings into ansi format.
I discovered the sysFromUnicode and sysToUnicode functions in the oorexx 4.1.1 manual but could to get any meaningful results.
here is what I tried:
1) store the text "Tür" in a utf8.txt file in utf8 format
2) and read it back in with rexx in a variable str.
fs = .Stream~new('utf8.txt')
str = fs~linein
say 'rc = 'sysFromUnicode(str, , , , 'outStem.')
loop ix over outStem.
say 'outstem.'ix' = <'outstem.ix'>'
rc = 0
outstem.!TEXT = <??>
outstem.!USEDDEFAULTCHAR = <1>
Can someone help me out here?
Madou, I can't help much here because I'm not real knowledgeable in this area. But, I have a few comments.
The interpreter is ANSI based, so you need the input to SysFromUnicode to be a series of bytes where the bytes are in UTF8 format. I would start off by not using linein(), but charin() where you give the complete file size as an argument and read in the complete file at one time. However, I'm not positive that will work because there may be come code page translation done.
Second, you need to specify the codepage argument as UFT8, somewhere. I looked at the code for SysFromUnicode and SysToUnicode, and the documentation for the Windows API it uses. I think that the Windows API converts to and from UTF16 *only*. You specify the codepage to use in the translation.
To convert UTF8 to ANSI, it looks to me like you would have to first convert UTF8 to UTF16 using SysToUnicode() and then take the output of that conversion and use SysFromUnicode to convert the UTF16 string to the ANSI codepage your are running in on your computer.
The following simple example works for me:
/* Simple UTF8 to ANSI test */
-- Cent Pound Currency signs
inString = 'c2a2c2a3c2a4'x
say 'Using string:' inString
ret = SysToUnicode(inString, 'UTF8', , out.)
if ret == 0 then say 'Convert UTF8 to UTF16 succeeded'
else say 'Convert UTF8 to UTF16 failed. rc:' ret
ret = SysFromUnicode(out.!TEXT, '437', , , ansi.)
if ret == 0 then say 'Convert UTF16 to ANSI succeeded'
else say 'Convert UTF16 to ANSI failed. rc:' ret
say 'ANSI text:' ansi.!TEXT
say "Used conversion character:" boolean2str(ansi.!USEDDEFAULTCHAR)
say 'Code page in console:'
use strict arg val
if val then return 'true'
else return 'false'
Note in the above, for the SysFromUnicode() call, I used the active code page number in the console I am working in. Here is the display I get in my console, how this will look in this e-mail on your system, I have no idea:
Using string: ┬ó┬ú┬ñ
Convert UTF8 to UTF16 succeeded
Convert UTF16 to ANSI succeeded
ANSI text: ¢£☼
Used conversion character: false
Code page in console:
Active code page: 437
But, in my console I see the 'Using string' as gibberish and the cent and pound sign correctly. So, it looks to me like this works fine. The currency sign does not make sense to me, but I don't know what it should be.