GnuCOBOL / Discussion / Help getting started: Multibyte as a *standard*

Mário Matos - 2024-03-28

Are there reasons not to use it?

Why?

I can do it in Windows easily with UTF-8, UTF-16, UTF-32, etc. even with PIC X (with math and a good guess)

PIC N is not, even... mandatory!

Using a Database without multibyte is like using «something» with «nothing»

What can we do?

Better, yet...

...what should we do?

Just saying (asking?)

MM

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Simon Sobisch - 2024-03-28
  
  Support for multibyte in the compiler contains calculating "by character", not "by byte" (what we currently do) in several places and for national/utf8 literals may need an internal conversion of the byte buffer.
  There is a GSOC proposal on handling this (see the contributions discussion board). We will see if this student work can be sponsored by Google.
  
  There is a GSOC project idea on handling this in the runtime, which is much more complicated especially for NATIONAL items, as these include NUL bytes and therefore prevent the use of any standard C string function. Note that this is a large project ~350h and no student wrote a proposal for this yet.
  
  "Someday" this will all be available with GnuCOBOL, but the amount of time people have is always less then what would be needed to do everything "soon". And many COBOL projects work fine without it and benefit more from tweaking what we already have or by adding missing extensions.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Anonymous - 2024-03-28
    
    When we store a string with UTF-8 or UTF-16, depending the language we need a... guess!
    
    PIC N stores two bytes by character which should suffice.
    
    UTF-8 may need 1-4 bytes
    UTF-16 may need 2-4 bytes (excluding surrogates when needed)
    
    Comparison of Unicode encodings - Wikipedia
    
    One can use UTF-8 for display while using other internally! (and vice-versa)
    
    Windows command prompt (even XP) is smart enough already! (with a tweek)
    
    Are you ready to go to an adventure... Simon?
    
    Don't be a chicken please!
    
    By the way, where's Brian?
    
    I'm fed up with his lame jokes... lately!
    
    Is he Ok?
    
    :-)
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Mário Matos - 2024-03-29
    
    I've already read Ahmed Maher' proposal!
    
    Is fascinating!
    
    Nevertheless, LIBICU is... complex!
    
    It's... big!
    
    Too much... complex!
    
    I have "nothing" against it!
    
    But...
    
    ...GnuCOBOL have... also... "simple" alternatives to... LIBICU
    
    One of them is... "libiconv"...
    
    ...the most basic and... simple I have already... found!
    
    And works! (even in Windows)
    
    Fascinating!
    
    Note:
    
    Windows does not need this! :-)
    
    :-)
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
    - Simon Sobisch - 2024-03-29
      
      So how do you convert from iso-8859-15 to utf8 using libiconv? How do you do the same with ebcdic source (also 8bit encodings)…?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      
      Anonymous
      
      Add attachments
      Cancel
      You seem to have CSS turned off. Please don't fill out this field.
      
      You seem to have CSS turned off. Please don't fill out this field.
      - Mário Matos - 2024-03-30
        
        «So how do you convert from iso-8859-15 to utf8 using libiconv? How do you
        do the same with ebcdic source (also 8bit encodings)…?»
        
        https://www.lemoda.net/c/iconv-example/iconv-example.html
        
        Examples:
        
        iconv_t iconvDesc;
        
        iconvDesc = iconv_open("UTF-8//TRANSLIT//IGNORE", "ISO−8859-15");
        
        https://man7.org/linux/man-pages/man3/iconv_open.3.html
        
        ...
        
        size = iconv(iconvDesc, inbuf, inbytesleft, outbuf, outbytesleft);
        
        https://man7.org/linux/man-pages/man3/iconv.3.html
        
        ...
        
        status = iconv_close(iconvDesc);
        
        https://man7.org/linux/man-pages/man3/iconv_close.3.html
        
        ...
        
        There are a some forks to libiconv with «EBCDIC» support
        
        (must configure --enable-extra-encodings)
        
        The principle is the same
        https://lists.gnu.org/archive/html/bug-gnu-libiconv/2022-01/msg00002.html
        https://github.com/pffang/libiconv-for-Windows (version 1.17 with MSVC)
        
        On Fri, Mar 29, 2024 at 6:58 AM Simon Sobisch sf-mensch@users.sourceforge.net wrote:
        
        So how do you convert from iso-8859-15 to utf8 using libiconv? How do you
        do the same with ebcdic source (also 8bit encodings)…?
        
        Multibyte as a standard
        https://sourceforge.net/p/gnucobol/discussion/help/thread/63d0291af1/?limit=50#4121/fe06/cab5/60a5
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/gnucobol/discussion/help/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        👍
        1
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Simon Sobisch - 2024-03-30
        
        Thank you for this useful post.
        it is likely useful to check with the students if this would be a reasonable and simpler approach.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.

Simon Sobisch - 2024-03-29

Please stay with a discussion time that you would use in a business meeting (and don't spam-post, I have much better things to do than moderating them).

Concerning the question here: this may already with as a user-defined name (just give it a try) - but be aware that this can lead to get out of col72 as we currently only byte-count for that.
If you want this in a literal either just place it in an alphanumeric one / PIC X (all operations then have to operate on the byte level, so this may have strange results with ref-mod/STRING/UNSTRING/INSPECT/MOVE) or as a utf8 literal u"中国が好きです" / in PIC U where it should (currently doesn't with GnuCOBOL) operate on the character level.

Similar applies to national literals / PIC N.

If you don't use utf8 for your COBOL source encoding, then the above changes with PIC U not working at all.

Note that using these with any type of screenio is a different thing because that relates to having setup the terminal correctly and, for extended screenio, depends on the library doing it.

Am 29. März 2024 00:27:32 MEZ schrieb "Mário Matos" matosma@users.sourceforge.net:

When I say...

中国が好きです

...in... Japanese...

...what should I do...

...in... COBOL, Simon?

Make a... war?

:-)

Last edit: Simon Sobisch 2024-03-29

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Multibyte as a standard

A free COBOL compiler

Forums

Help

Multibyte as a standard

Multibyte as a *standard*

A free COBOL compiler

Forums

Help

Multibyte as a *standard* document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Multibyte as a standard

Multibyte as a standard