Menu

Multibyte as a *standard*

2024-03-28
2024-04-02
  • Mário Matos

    Mário Matos - 2024-03-28

    Are there reasons not to use it?

    Why?

    I can do it in Windows easily with UTF-8, UTF-16, UTF-32, etc. even with PIC X (with math and a good guess)

    PIC N is not, even... mandatory!

    Using a Database without multibyte is like using «something» with «nothing»

    What can we do?

    Better, yet...

    ...what should we do?

    Just saying (asking?)

    MM

     
    • Simon Sobisch

      Simon Sobisch - 2024-03-28

      Support for multibyte in the compiler contains calculating "by character", not "by byte" (what we currently do) in several places and for national/utf8 literals may need an internal conversion of the byte buffer.
      There is a GSOC proposal on handling this (see the contributions discussion board). We will see if this student work can be sponsored by Google.

      There is a GSOC project idea on handling this in the runtime, which is much more complicated especially for NATIONAL items, as these include NUL bytes and therefore prevent the use of any standard C string function. Note that this is a large project ~350h and no student wrote a proposal for this yet.

      "Someday" this will all be available with GnuCOBOL, but the amount of time people have is always less then what would be needed to do everything "soon". And many COBOL projects work fine without it and benefit more from tweaking what we already have or by adding missing extensions.

       
      • Anonymous

        Anonymous - 2024-03-28

        When we store a string with UTF-8 or UTF-16, depending the language we need a... guess!

        PIC N stores two bytes by character which should suffice.

        UTF-8 may need 1-4 bytes
        UTF-16 may need 2-4 bytes (excluding surrogates when needed)

        Comparison of Unicode encodings - Wikipedia

        One can use UTF-8 for display while using other internally! (and vice-versa)

        Windows command prompt (even XP) is smart enough already! (with a tweek)

        Are you ready to go to an adventure... Simon?

        Don't be a chicken please!

        By the way, where's Brian?

        I'm fed up with his lame jokes... lately!

        Is he Ok?

        :-)

         
      • Mário Matos

        Mário Matos - 2024-03-29

        I've already read Ahmed Maher' proposal!

        Is fascinating!

        Nevertheless, LIBICU is... complex!

        It's... big!

        Too much... complex!

        I have "nothing" against it!

        But...

        ...GnuCOBOL have... also... "simple" alternatives to... LIBICU

        One of them is... "libiconv"...

        ...the most basic and... simple I have already... found!

        And works! (even in Windows)

        Fascinating!

        Note:

        Windows does not need this! :-)

        :-)

         
  • Simon Sobisch

    Simon Sobisch - 2024-03-29

    Please stay with a discussion time that you would use in a business meeting (and don't spam-post, I have much better things to do than moderating them).

    Concerning the question here: this may already with as a user-defined name (just give it a try) - but be aware that this can lead to get out of col72 as we currently only byte-count for that.
    If you want this in a literal either just place it in an alphanumeric one / PIC X (all operations then have to operate on the byte level, so this may have strange results with ref-mod/STRING/UNSTRING/INSPECT/MOVE) or as a utf8 literal u"中国が好きです" / in PIC U where it should (currently doesn't with GnuCOBOL) operate on the character level.

    Similar applies to national literals / PIC N.

    If you don't use utf8 for your COBOL source encoding, then the above changes with PIC U not working at all.

    Note that using these with any type of screenio is a different thing because that relates to having setup the terminal correctly and, for extended screenio, depends on the library doing it.

    Am 29. März 2024 00:27:32 MEZ schrieb "Mário Matos" matosma@users.sourceforge.net:

    When I say...

    中国が好きです

    ...in... Japanese...

    ...what should I do...

    ...in... COBOL, Simon?

    Make a... war?

    :-)

     

    Last edit: Simon Sobisch 2024-03-29

Anonymous
Anonymous

Add attachments
Cancel