Menu

EBCDIC datasets - needing some code points please

GnuCOBOL
2023-08-09
2023-08-11
  • Ralph Linkletter

    I have been analyzing how to determine whether data in a file is EBCDIC or ASCII.
    There is an intersection set between ASCII and EBCDIC.
    In trying to determine the appropriate character set, I added logic to ignore a character that is represented in the intersection set.
    I keep scanning thru the record until I happen upon a character that is only in one of the sets (either ASCII or EBCDIC)
    I then determine whether the scanned character is EBCDIC or ASCII
    I apply the result to the remaining records.
    It seems to work fine if there is adequate data available

    I need the ASCII code points for languages other than English (European and Scandinavian in particular).
    Umlaut, accent characters, etc. are currently not represented.
    Neither should present an issue - I merely lack the specific code points.

     05  CHAR-FLD                    PIC X(01).
                   88  ASCII-CHAR VALUE X'20' THRU X'3F',
                                        X'41' THRU X'7E',
                                        X'80' THRU X'9F',
                                        X'A0' THRU X'A5'.
                   88  EBCDIC-CHAR VALUE
                      X'4B' THRU X'4F',
                      X'5A' THRU X'5F',
                      X'6B' THRU X'6F',
                      X'7A' THRU X'7F',
                      X'81' THRU X'89',
                      X'91' THRU X'99',
                      X'A1' THRU X'AB',
                      X'C0' THRU X'FF',
                      X'40',
                      X'70',
                      X'B0',
                      X'B1',
                      X'61',
                      X'60'.
    
               MOVE 1 TO I
               MOVE '0' TO AMBIGUITY
               PERFORM UNTIL AMBIGUITY = '1'
                   MOVE REC-AREA (I:1) TO CHAR-FLD
                   ADD 1 TO I
                   IF EBCDIC-CHAR AND ASCII-CHAR
                       CONTINUE
                   ELSE
                       MOVE '1' TO AMBIGUITY
                       IF EBCDIC-CHAR
                           MOVE '2' TO EDIT-SWITCH
                       END-IF
                   END-IF
                   IF I = TRTCH-LEN
                       MOVE '1' TO AMBIGUITY
                   END-IF
               END-PERFORM.
    

    I will optimize the above procedure - I kept it simple so that debugging was more transparent.
    My efforts are a precursor to enabling the GnuCOBOL compiler to easily process an EBCDIC dataset containing COMP, COMP-3, Usage Display data fields.

     
    • Vincent (Bryan) Coen

      On 09/08/2023 23:26, Ralph Linkletter wrote:

      I have been analyzing how to determine whether data in a file is
      EBCDIC or ASCII.
      There is an intersection set between ASCII and EBCDIC.
      In trying to determine the appropriate character set, I added logic to
      ignore a character that is represented in the intersection set.
      I keep scanning thru the record until I happen upon a character that
      is only in one of the sets (either ASCII or EBCDIC)
      I then determine whether the scanned character is EBCDIC or ASCII
      I apply the result to the remaining records.
      It seems to work fine if there is adequate data available

      I will optimize the above procedure - I kept it simple so that
      debugging was more transparent.

      My efforts are a precursor to enabling the GnuCOBOL compiler to easily
      process an EBCDIC dataset containing COMP, COMP-3, Usage Display data
      fields.

      Small thought :

      Looking at text data is one thing but reading IBM EBCDIC std binary
      fields another starting with what the IBM, big or little Indian, and
      what about the target system what's that one and even if they are the
      same is the storage of such the same ?

      I would not bet the farm on it :)

       
      • Ralph Linkletter

        Vincent a common misunderstanding :-)
        The issue is not any binary form or packed decimal form.
        The issue is usage display data fields - the only data type that must participate in the conversion of ASCII<->EBCDIC.
        Long ago in a galaxy far away I personally directed the enabling of EBCDIC in the Micro Focus COBOL compiler and RTS (also brought that concept to the IBM Visual Age PL/I compiler.) The sole focus being usage display data fields.

        With GnuCOBOL I must approach the problem via a preprocessor, whereas with MF and IBM it was enabled in the compiler and RTS.

         
        • pottmi

          pottmi - 2023-08-09

          I have a keen interest in getting ebcdic supported in GnuCOBOL like it is in MF.

          If there is ever an initiative to get this done I would like to participate.

          On Wed, Aug 9, 2023 at 6:44 PM Ralph Linkletter
          zosralph@users.sourceforge.net wrote:

          Vincent a common misunderstanding :-)
          The issue is not any binary form or packed decimal form.
          The issue is usage display data fields - the only data type that must participate in the conversion of ASCII<->EBCDIC.
          Long ago in a galaxy far away I personally directed the enabling of EBCDIC in the Micro Focus COBOL compiler and RTS (also brought that concept to the IBM Visual Age PL/I compiler.) The sole focus being usage display data fields.

          With GnuCOBOL I must approach the problem via a preprocessor, whereas with MF and IBM it was enabled in the compiler and RTS.


          EBCDIC datasets - needing some code points please


          Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/gnucobol/discussion/cobol/

          To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

           
          • Simon Sobisch

            Simon Sobisch - 2023-08-09

            Note that GnuCOBOL 3.2 added both support for the CODE-SET clause and the option to specify either one of the distributed or a separate encoding table to translate (parts of) an FD automatically during read/write. But this needs to be coded in the program.
            It should be possible to do this "live" using the external file handler, but I don't remember if we finished that part.

             
            • Ralph Linkletter

              @sf-mensch ; pottmi
              In EXTFH ?
              Supported if in EXTFH ?
              1. Do not translate binary and packed decimal ?
              2. Read into ?
              3. Write from ?
              4. Context sensitive to the COBOL expression as a result of a redefines ?

              Probably way out of scope.
              1. SQL Host variables ?
              2. CICS I-O ?
              3. IMS segments ?

              Is there a GnuCOBOL / C method to determine the keyboard layout / keyboard identifier being used ?

              For instance the German keyboard is defined as keyboard idenifier 407
              https://learn.microsoft.com/en-us/globalization/keyboards/kbdgr

              The US keyboard is defined as keyboard identifier 409
              https://learn.microsoft.com/en-us/globalization/keyboards/kbdus_7

              I looked at the * .ttbl files in config - default . ttbl
              From what I can tell the description should not be ASCII to EBCDIC but rather
              EBCDIC to ASCII.
              Given a decimal value 249 the table indicates the translation to be an ASCII '9' (39)
              That is correct. The description is not.
              "default" 8-bit ASCII to EBCDIC 1047 conversion table (likely MF)"
              Using the inverse - offset to 57 (ASCII "9") I don't see a F9
              But perhaps I do not understand this statement at the end of the table.
              "This translation being symmetric, the table is built from the previous one." ?

              Thanks for the URL(s)

               
              • Simon Sobisch

                Simon Sobisch - 2023-08-11

                CODE-SET is specified by the programmer which specifies what to translate, that would be similar with EXTFH.
                Of course you don't get the right result when specifying it on a record which contains other USAGE than DISPLAY (and if all are DISPLAY, then REDEFINES doesn't matter)...

                And CODE-SET only applies to FD, so none of the second group applies.

                Note that you can manually do that with the new translation tables and INSPECT CONVERTING.

                Reading the keyboard identifier would be highly non-portable, it is likely more reasonable to read one of the locale settings instead, and that should work with GnuCOBOL already.

                For the format of the translation tables, see the manual... (In short: either they are symmetric = built by using the reverse, or both directions use a different translation).
                Feel welcome to create more translation tables with citing the source. If you contribute those, we could distribute them with GnuCOBOL in the future.

                 
    • Simon Sobisch

      Simon Sobisch - 2023-08-09
       

Log in to post a comment.