I have been analyzing how to determine whether data in a file is EBCDIC or ASCII.
There is an intersection set between ASCII and EBCDIC.
In trying to determine the appropriate character set, I added logic to ignore a character that is represented in the intersection set.
I keep scanning thru the record until I happen upon a character that is only in one of the sets (either ASCII or EBCDIC)
I then determine whether the scanned character is EBCDIC or ASCII
I apply the result to the remaining records.
It seems to work fine if there is adequate data available
I need the ASCII code points for languages other than English (European and Scandinavian in particular).
Umlaut, accent characters, etc. are currently not represented.
Neither should present an issue - I merely lack the specific code points.
I will optimize the above procedure - I kept it simple so that debugging was more transparent.
My efforts are a precursor to enabling the GnuCOBOL compiler to easily process an EBCDIC dataset containing COMP, COMP-3, Usage Display data fields.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have been analyzing how to determine whether data in a file is
EBCDIC or ASCII.
There is an intersection set between ASCII and EBCDIC.
In trying to determine the appropriate character set, I added logic to
ignore a character that is represented in the intersection set.
I keep scanning thru the record until I happen upon a character that
is only in one of the sets (either ASCII or EBCDIC)
I then determine whether the scanned character is EBCDIC or ASCII
I apply the result to the remaining records.
It seems to work fine if there is adequate data available
I will optimize the above procedure - I kept it simple so that
debugging was more transparent.
My efforts are a precursor to enabling the GnuCOBOL compiler to easily
process an EBCDIC dataset containing COMP, COMP-3, Usage Display data
fields.
Small thought :
Looking at text data is one thing but reading IBM EBCDIC std binary
fields another starting with what the IBM, big or little Indian, and
what about the target system what's that one and even if they are the
same is the storage of such the same ?
Vincent a common misunderstanding :-)
The issue is not any binary form or packed decimal form.
The issue is usage display data fields - the only data type that must participate in the conversion of ASCII<->EBCDIC.
Long ago in a galaxy far away I personally directed the enabling of EBCDIC in the Micro Focus COBOL compiler and RTS (also brought that concept to the IBM Visual Age PL/I compiler.) The sole focus being usage display data fields.
With GnuCOBOL I must approach the problem via a preprocessor, whereas with MF and IBM it was enabled in the compiler and RTS.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Vincent a common misunderstanding :-)
The issue is not any binary form or packed decimal form.
The issue is usage display data fields - the only data type that must participate in the conversion of ASCII<->EBCDIC.
Long ago in a galaxy far away I personally directed the enabling of EBCDIC in the Micro Focus COBOL compiler and RTS (also brought that concept to the IBM Visual Age PL/I compiler.) The sole focus being usage display data fields.
With GnuCOBOL I must approach the problem via a preprocessor, whereas with MF and IBM it was enabled in the compiler and RTS.
Note that GnuCOBOL 3.2 added both support for the CODE-SET clause and the option to specify either one of the distributed or a separate encoding table to translate (parts of) an FD automatically during read/write. But this needs to be coded in the program.
It should be possible to do this "live" using the external file handler, but I don't remember if we finished that part.
@sf-mensch ; pottmi
In EXTFH ?
Supported if in EXTFH ?
1. Do not translate binary and packed decimal ?
2. Read into ?
3. Write from ?
4. Context sensitive to the COBOL expression as a result of a redefines ?
Probably way out of scope.
1. SQL Host variables ?
2. CICS I-O ?
3. IMS segments ?
Is there a GnuCOBOL / C method to determine the keyboard layout / keyboard identifier being used ?
I looked at the * .ttbl files in config - default . ttbl
From what I can tell the description should not be ASCII to EBCDIC but rather
EBCDIC to ASCII.
Given a decimal value 249 the table indicates the translation to be an ASCII '9' (39)
That is correct. The description is not.
"default" 8-bit ASCII to EBCDIC 1047 conversion table (likely MF)"
Using the inverse - offset to 57 (ASCII "9") I don't see a F9
But perhaps I do not understand this statement at the end of the table.
"This translation being symmetric, the table is built from the previous one." ?
Thanks for the URL(s)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
CODE-SET is specified by the programmer which specifies what to translate, that would be similar with EXTFH.
Of course you don't get the right result when specifying it on a record which contains other USAGE than DISPLAY (and if all are DISPLAY, then REDEFINES doesn't matter)...
And CODE-SET only applies to FD, so none of the second group applies.
Note that you can manually do that with the new translation tables and INSPECT CONVERTING.
Reading the keyboard identifier would be highly non-portable, it is likely more reasonable to read one of the locale settings instead, and that should work with GnuCOBOL already.
For the format of the translation tables, see the manual... (In short: either they are symmetric = built by using the reverse, or both directions use a different translation).
Feel welcome to create more translation tables with citing the source. If you contribute those, we could distribute them with GnuCOBOL in the future.
I have been analyzing how to determine whether data in a file is EBCDIC or ASCII.
There is an intersection set between ASCII and EBCDIC.
In trying to determine the appropriate character set, I added logic to ignore a character that is represented in the intersection set.
I keep scanning thru the record until I happen upon a character that is only in one of the sets (either ASCII or EBCDIC)
I then determine whether the scanned character is EBCDIC or ASCII
I apply the result to the remaining records.
It seems to work fine if there is adequate data available
I need the ASCII code points for languages other than English (European and Scandinavian in particular).
Umlaut, accent characters, etc. are currently not represented.
Neither should present an issue - I merely lack the specific code points.
I will optimize the above procedure - I kept it simple so that debugging was more transparent.
My efforts are a precursor to enabling the GnuCOBOL compiler to easily process an EBCDIC dataset containing COMP, COMP-3, Usage Display data fields.
On 09/08/2023 23:26, Ralph Linkletter wrote:
Looking at text data is one thing but reading IBM EBCDIC std binary
fields another starting with what the IBM, big or little Indian, and
what about the target system what's that one and even if they are the
same is the storage of such the same ?
I would not bet the farm on it :)
Vincent a common misunderstanding :-)
The issue is not any binary form or packed decimal form.
The issue is usage display data fields - the only data type that must participate in the conversion of ASCII<->EBCDIC.
Long ago in a galaxy far away I personally directed the enabling of EBCDIC in the Micro Focus COBOL compiler and RTS (also brought that concept to the IBM Visual Age PL/I compiler.) The sole focus being usage display data fields.
With GnuCOBOL I must approach the problem via a preprocessor, whereas with MF and IBM it was enabled in the compiler and RTS.
I have a keen interest in getting ebcdic supported in GnuCOBOL like it is in MF.
If there is ever an initiative to get this done I would like to participate.
On Wed, Aug 9, 2023 at 6:44 PM Ralph Linkletter
zosralph@users.sourceforge.net wrote:
Note that GnuCOBOL 3.2 added both support for the CODE-SET clause and the option to specify either one of the distributed or a separate encoding table to translate (parts of) an FD automatically during read/write. But this needs to be coded in the program.
It should be possible to do this "live" using the external file handler, but I don't remember if we finished that part.
@sf-mensch ; pottmi
In EXTFH ?
Supported if in EXTFH ?
1. Do not translate binary and packed decimal ?
2. Read into ?
3. Write from ?
4. Context sensitive to the COBOL expression as a result of a redefines ?
Probably way out of scope.
1. SQL Host variables ?
2. CICS I-O ?
3. IMS segments ?
Is there a GnuCOBOL / C method to determine the keyboard layout / keyboard identifier being used ?
For instance the German keyboard is defined as keyboard idenifier 407
https://learn.microsoft.com/en-us/globalization/keyboards/kbdgr
The US keyboard is defined as keyboard identifier 409
https://learn.microsoft.com/en-us/globalization/keyboards/kbdus_7
I looked at the * .ttbl files in config - default . ttbl
From what I can tell the description should not be ASCII to EBCDIC but rather
EBCDIC to ASCII.
Given a decimal value 249 the table indicates the translation to be an ASCII '9' (39)
That is correct. The description is not.
"default" 8-bit ASCII to EBCDIC 1047 conversion table (likely MF)"
Using the inverse - offset to 57 (ASCII "9") I don't see a F9
But perhaps I do not understand this statement at the end of the table.
"This translation being symmetric, the table is built from the previous one." ?
Thanks for the URL(s)
CODE-SET is specified by the programmer which specifies what to translate, that would be similar with EXTFH.
Of course you don't get the right result when specifying it on a record which contains other USAGE than DISPLAY (and if all are DISPLAY, then REDEFINES doesn't matter)...
And CODE-SET only applies to FD, so none of the second group applies.
Note that you can manually do that with the new translation tables and INSPECT CONVERTING.
Reading the keyboard identifier would be highly non-portable, it is likely more reasonable to read one of the locale settings instead, and that should work with GnuCOBOL already.
For the format of the translation tables, see the manual... (In short: either they are symmetric = built by using the reverse, or both directions use a different translation).
Feel welcome to create more translation tables with citing the source. If you contribute those, we could distribute them with GnuCOBOL in the future.
You want to have a look at 8 bit encodings, mostly at ISO 8859. For the European/Scandinavian you've asked for, especially check out
https://en.m.wikipedia.org/wiki/ISO/IEC_8859-15#Codepage_layout
https://en.m.wikipedia.org/wiki/ISO/IEC_8859-10#Codepage_layout
https://en.m.wikipedia.org/wiki/ISO/IEC_8859-16#Codepage_layout