Menu

State of unicode with ncurses

GnuCOBOL
2024-07-14
2024-07-19
  • Michael Milliman

    I've seen a lot of discussion on the forums pertaining to unicode characters and ncurses, but nothing recent. I have been experimenting with trying to get unicode characters to display with ncurses in GnuCobol, but have not had any luck, nor have I seen anything on the forums that really improves the situation. I have successfully displayed unicode characters (U+2557 and U+2605) using the display statement in GnuCobol but without having initialized the ncurses functionality. However, once I invoke ncurses functionality (like display " " at line 1 column 1 with blank screen) I can no longer get unicode characters to show up on the screen. I have tried both UTF8 and UTF16 encoding (i.e. X"E29597" for U+2557) and have had no success. Just to eliminate the possibility that I didn't understand ncurses as implemented on my system, I wrote a short C program to invoke ncurses and display unicode characters using addwch directly in C; this was successful.

    Is using unicode characters with ncurses possible from GnuCobol. If not, is there any hope of that support being added in the future?

     
    • Ralph Linkletter

      GnuCobol invokes ?curses upon the encountering the first extended display statement.
      All vanilla displays after the implicit invocation of ?curses are now the domain of ?curses.
      It would be nice to dynamically enable and disable the ?curses interface.

       
  • Michael Milliman

    True enough. I have wrestled with the display for a little bit with GnuCobol. I can do some of what I want without the ncurses library, and some of it with, but not all of it either way. I'm starting to suspect that my best option will through bypassing the built-in GnuCobol functionality and deal with the ncurses library directly with CALLs to the ncurses routines.

     
  • Chuck H.

    Chuck H. - 2024-07-14

    Michael,

    I've not worked with unicode, however I have built a number of CBL_ callable functions (written in C) which allow access to PDCursesMOD functions. I'm pretty sure that this would work with ncursesw as well. However that means that all keyboard and screen access has to be done via curses and not via gnucobol's screenio runtime.

    if you wish to discuss in more detail, you can message me directly or contact me on skype, just search for chuck haatvedt.

        Chuck Haatvedt
    
     
  • Simon Sobisch

    Simon Sobisch - 2024-07-14

    Rechecked: the biggest issue is likely that screenio.c operates on plain char*, it doesn't to any conversion before output.
    I guess it should be working once all functions are changed to use the wide version, and there's an explicit call to mbstowcs to do the conversion from utf8 (or other encodings depending on the locale setup - which then needs to match the source encoding) to widechars up front.

    If anyone wants to provide patches...

     
  • Ralph Linkletter

    Just to bring to the forefront a reality check.
    Albeit Windows and Linux are implemented in UTF variants - 100% of the zOS IBM mainframe world is implemented in EBCDIC (Linux under zOS is still Linux [is ASCII / UTF]).

    Consider that petabytes, zettabytes, perhaps even yottabytes of archived and current zOS datasets are encoded in EBCDIC.
    Also consider 95% (probably more) of COBOL applications are creations deployed on IBM / Unisys platforms. All of which are EBCDIC based.

    ASCII with 127 / 128 codes points is an inferior encoding scheme when compared with - EBCDIC
    255 / 256 code points.

    As far as I can surmise, UTF variants are deployed on "new age" platforms. Platforms of which I do not regard COBOL as a viable deployment option.

    Compatibility: Unicode is compatible with ASCII, while EBCDIC is not compatible with Unicode.
    Oracle has deployed UTF-EBCDIC but again such an effort is incompatible with the existing mainframe universe.

    When considering the significant effort to implement UTF in GnuCOBOL, the practicality of doing so presents a question - Why ?

    A "show stopper" difference between Unicode and EBCDIC.
    Sorting in Unicode places numeric characters before alphabetic characters, while in EBCDIC, alphabetic characters are sorted before numeric characters.

    Just sayin :-)

     

    Last edit: Ralph Linkletter 2024-07-14
  • Michael Milliman

    Point well taken, Ralph. I haven't had the need to use EBCDIC in many, many years, back when I was working on a System 360 machine nearly 50 years ago. I can see where trying to maintain compatibility with EBCDIC and supporting Unicode might turn into a real rat's nest. And EBCDIC is the defacto standard for mainframes where the vast, vast majority of COBOL programming resides. So, full Unicode support is certainly not any sort of priority!

    Since my original post, I have come to essentially the same conclusion the Chuck H. expressed above. I have begun experimenting with some C "wrapper" functions around the ncurses library to expose the wide character functionality available there, and had some success. However, I think with some very careful coding, it may be possible to mix both the screenio functionality of GnuCobol and the added functionality of a wrapper library. I have had success passing WINDOW * back and forth from Cobol to C, and have seen another post on one of the forums here where it was possible to grab the sdtscr variable from ncurses after GnuCobol had initialized it. So it should be possible to grab that variable and pass it to the C wrappers and allow inter-operation between the two.

    Much more experimentation will be necessary!!

     
    • Simon Sobisch

      Simon Sobisch - 2024-07-15

      As noted before: Maybe it is more reasonable to use the current svn version and adjust libcob/screenio.c itself. This removes the need to do careful coding on the application side and once your changes are working those could be integrated into GnuCOBOL and also provide different code paths for alphanumeric/national/utf-8 data items.

       
      • Michael Milliman

        That would probably be the ideal. At this point, I have at least a working handle on what's going on and what needs to be done to use Unicode (at this point UTF-16 works best with ncurses) from COBOL. I guess the next step would be to wade through screenio.c and get some idea as to what might be required to modify that to work with the national data items. Were to carry this forward, I would also have to dive into the COBOL standard(s) as well, as any changes would need to conform to the appropriate standards if at all possible.

        This is not an, "Oh, I'll have something ready tomorrow," kind of project!

         
        • Simon Sobisch

          Simon Sobisch - 2024-07-19

          If you handle screenio and ncurses wide I can handle the rest. And for PIC X / PIC U there are conversion routines to/from the ncurses wide chars.

           
  • Chuck H.

    Chuck H. - 2024-07-19

    Michael,

    If you could send me some of the C wrappers you have written to test UNICODE, I'm presently working on changes to screenio.c to handle the use of panels via CURSES. I would be interested in your COBOL programs you used to test the C wrappers as well.

    I do all of my testing development work on Windows using PDCursesMOD which the author states as supporting UNICODE. See the following link

    https://github.com/Bill-Gray/PDCursesMod/releases

      Chuck Haatvedt
    
     

Log in to post a comment.