GnuCOBOL / Discussion / GnuCOBOL: EBCDIC datasets - needing some code points please

Ralph Linkletter - 2023-08-09

I have been analyzing how to determine whether data in a file is EBCDIC or ASCII.
There is an intersection set between ASCII and EBCDIC.
In trying to determine the appropriate character set, I added logic to ignore a character that is represented in the intersection set.
I keep scanning thru the record until I happen upon a character that is only in one of the sets (either ASCII or EBCDIC)
I then determine whether the scanned character is EBCDIC or ASCII
I apply the result to the remaining records.
It seems to work fine if there is adequate data available

I need the ASCII code points for languages other than English (European and Scandinavian in particular).
Umlaut, accent characters, etc. are currently not represented.
Neither should present an issue - I merely lack the specific code points.

05 CHAR-FLD PIC X(01). 88 ASCII-CHAR VALUE X'20' THRU X'3F', X'41' THRU X'7E', X'80' THRU X'9F', X'A0' THRU X'A5'. 88 EBCDIC-CHAR VALUE X'4B' THRU X'4F', X'5A' THRU X'5F', X'6B' THRU X'6F', X'7A' THRU X'7F', X'81' THRU X'89', X'91' THRU X'99', X'A1' THRU X'AB', X'C0' THRU X'FF', X'40', X'70', X'B0', X'B1', X'61', X'60'. MOVE 1 TO I MOVE '0' TO AMBIGUITY PERFORM UNTIL AMBIGUITY = '1' MOVE REC-AREA (I:1) TO CHAR-FLD ADD 1 TO I IF EBCDIC-CHAR AND ASCII-CHAR CONTINUE ELSE MOVE '1' TO AMBIGUITY IF EBCDIC-CHAR MOVE '2' TO EDIT-SWITCH END-IF END-IF IF I = TRTCH-LEN MOVE '1' TO AMBIGUITY END-IF END-PERFORM.

I will optimize the above procedure - I kept it simple so that debugging was more transparent.
My efforts are a precursor to enabling the GnuCOBOL compiler to easily process an EBCDIC dataset containing COMP, COMP-3, Usage Display data fields.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Vincent (Bryan) Coen - 2023-08-09
  
  On 09/08/2023 23:26, Ralph Linkletter wrote:
  
  I have been analyzing how to determine whether data in a file is
  EBCDIC or ASCII.
  There is an intersection set between ASCII and EBCDIC.
  In trying to determine the appropriate character set, I added logic to
  ignore a character that is represented in the intersection set.
  I keep scanning thru the record until I happen upon a character that
  is only in one of the sets (either ASCII or EBCDIC)
  I then determine whether the scanned character is EBCDIC or ASCII
  I apply the result to the remaining records.
  It seems to work fine if there is adequate data available
  
  I will optimize the above procedure - I kept it simple so that
  debugging was more transparent.
  
  My efforts are a precursor to enabling the GnuCOBOL compiler to easily
  process an EBCDIC dataset containing COMP, COMP-3, Usage Display data
  fields.
  
  Small thought :
  
  Looking at text data is one thing but reading IBM EBCDIC std binary
  fields another starting with what the IBM, big or little Indian, and
  what about the target system what's that one and even if they are the
  same is the storage of such the same ?
  
  I would not bet the farm on it :)
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Ralph Linkletter - 2023-08-09
    
    Vincent a common misunderstanding :-)
    The issue is not any binary form or packed decimal form.
    The issue is usage display data fields - the only data type that must participate in the conversion of ASCII<->EBCDIC.
    Long ago in a galaxy far away I personally directed the enabling of EBCDIC in the Micro Focus COBOL compiler and RTS (also brought that concept to the IBM Visual Age PL/I compiler.) The sole focus being usage display data fields.
    
    With GnuCOBOL I must approach the problem via a preprocessor, whereas with MF and IBM it was enabled in the compiler and RTS.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - pottmi - 2023-08-09
      
      I have a keen interest in getting ebcdic supported in GnuCOBOL like it is in MF.
      
      If there is ever an initiative to get this done I would like to participate.
      
      On Wed, Aug 9, 2023 at 6:44 PM Ralph Linkletter
      zosralph@users.sourceforge.net wrote:
      
      Vincent a common misunderstanding :-)
      The issue is not any binary form or packed decimal form.
      The issue is usage display data fields - the only data type that must participate in the conversion of ASCII<->EBCDIC.
      Long ago in a galaxy far away I personally directed the enabling of EBCDIC in the Micro Focus COBOL compiler and RTS (also brought that concept to the IBM Visual Age PL/I compiler.) The sole focus being usage display data fields.
      
      With GnuCOBOL I must approach the problem via a preprocessor, whereas with MF and IBM it was enabled in the compiler and RTS.
      
      EBCDIC datasets - needing some code points please
      
      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/gnucobol/discussion/cobol/
      
      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Simon Sobisch - 2023-08-09
        
        Note that GnuCOBOL 3.2 added both support for the CODE-SET clause and the option to specify either one of the distributed or a separate encoding table to translate (parts of) an FD automatically during read/write. But this needs to be coded in the program.
        It should be possible to do this "live" using the external file handler, but I don't remember if we finished that part.
        
        alternate
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Ralph Linkletter - 2023-08-10
        
        @sf-mensch ; pottmi
        In EXTFH ?
        Supported if in EXTFH ?
        1. Do not translate binary and packed decimal ?
        2. Read into ?
        3. Write from ?
        4. Context sensitive to the COBOL expression as a result of a redefines ?
        
        Probably way out of scope.
        1. SQL Host variables ?
        2. CICS I-O ?
        3. IMS segments ?
        
        Is there a GnuCOBOL / C method to determine the keyboard layout / keyboard identifier being used ?
        
        For instance the German keyboard is defined as keyboard idenifier 407
        https://learn.microsoft.com/en-us/globalization/keyboards/kbdgr
        
        The US keyboard is defined as keyboard identifier 409
        https://learn.microsoft.com/en-us/globalization/keyboards/kbdus_7
        
        I looked at the * .ttbl files in config - default . ttbl
        From what I can tell the description should not be ASCII to EBCDIC but rather
        EBCDIC to ASCII.
        Given a decimal value 249 the table indicates the translation to be an ASCII '9' (39)
        That is correct. The description is not.
        "default" 8-bit ASCII to EBCDIC 1047 conversion table (likely MF)"
        Using the inverse - offset to 57 (ASCII "9") I don't see a F9
        But perhaps I do not understand this statement at the end of the table.
        "This translation being symmetric, the table is built from the previous one." ?
        
        Thanks for the URL(s)
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Simon Sobisch - 2023-08-11
        
        CODE-SET is specified by the programmer which specifies what to translate, that would be similar with EXTFH.
        Of course you don't get the right result when specifying it on a record which contains other USAGE than DISPLAY (and if all are DISPLAY, then REDEFINES doesn't matter)...
        
        And CODE-SET only applies to FD, so none of the second group applies.
        
        Note that you can manually do that with the new translation tables and INSPECT CONVERTING.
        
        Reading the keyboard identifier would be highly non-portable, it is likely more reasonable to read one of the locale settings instead, and that should work with GnuCOBOL already.
        
        For the format of the translation tables, see the manual... (In short: either they are symmetric = built by using the reverse, or both directions use a different translation).
        Feel welcome to create more translation tables with citing the source. If you contribute those, we could distribute them with GnuCOBOL in the future.
        
        alternate
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Simon Sobisch - 2023-08-09
  
  You want to have a look at 8 bit encodings, mostly at ISO 8859. For the European/Scandinavian you've asked for, especially check out
  
  https://en.m.wikipedia.org/wiki/ISO/IEC_8859-15#Codepage_layout
  https://en.m.wikipedia.org/wiki/ISO/IEC_8859-10#Codepage_layout
  https://en.m.wikipedia.org/wiki/ISO/IEC_8859-16#Codepage_layout
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

EBCDIC datasets - needing some code points please

A free COBOL compiler

Forums

Help

EBCDIC datasets - needing some code points please

EBCDIC datasets - needing some code points please

A free COBOL compiler

Forums

Help

EBCDIC datasets - needing some code points please document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

EBCDIC datasets - needing some code points please