Menu

#751 Unable to enter Swedish special characters using SCREEN SECTION

invalid
not-our-bug
nobody
5 - default
2022-11-06
2021-08-09
No

Submitted to me by Christer Karlsson in Sweden christer.zeke@gmail.com

Using GnuCOBOL 3.1.2 built with MinGW and PDCursesMod 4.2.3, Mr. Karlsson is able to DISPLAY Swedish special characters in the range of X'80' through X'FF' and ACCEPT Swedish special characters, provided he does not use SCREEN SECTION (or extended screen i-o).

Previously, using GC 3.1.2 and PDCursesMod 4.2.0, he was NOT able to ACCEPT Swedish special characters. Keypresses fail with a beep and no character is accepted or echo'ed.

Using GnuCOBOL 3.1.2 built with MinGW and PDCursesMod 4.2.3, and using SCREEN SECTION, the user is able to DISPLAY Swedish special characters in the range of X'80' through X'FF', but not able to ACCEPT Swedish special characters. Keypresses fail with a beep and no character is accepted or echo'ed.

He is always able to ACCEPT ASCII characters in the range X'00' to X'7F'. For his testing, he changed his CMD.EXE codepage to 1252 to match the codepage used in Notepad++ (and other GUI apps) in his Windows 10 environment.

Two sample programs with test results are attached.

1 Attachments

Related

Discussion: GnuCOBOL project direction

Discussion

  • Arnold Trembley

    Arnold Trembley - 2021-08-09

    Correction: The zip archive contains source code and listings for the 2 sample programs (with and without SCREEN SECTION). The problem should be evident whenever attempting to enter ASCII characters in the range X'80' through X'FF' and using ACCEPT via SCREEN SECTION.

     
  • Simon Sobisch

    Simon Sobisch - 2022-09-10
    • labels: --> screen section
    • status: open --> not-our-bug
    • Group: GC 3.x --> invalid
     
  • Simon Sobisch

    Simon Sobisch - 2022-09-10

    The sample is fine but that's really: not our bug.

    It works fine with ncurses (used by MSYS2 by default, likely also usable with MINGW, see https://github.com/mirror/ncurses/blob/master/README.MinGW for build instructions) when LANG is set appropriate.

    PDCurses wincon returns mostly the "expected" return codes; but if built with UTF8 any extended characters seem to be wrong. If built without UTF8 or WIDE support then older versions returned a bad prefix; this can be possibly fixed by adjusting screenio.c and change cob_convert_key:

        default:
    +#if defined __PDCURSES__ && !defined (PDC_WIDE)
    
    +       /* special case PDCurses (older versions) returned unexpected prefix
    +          with extended character set input, see
    +          https://github.com/wmcbrine/PDCurses/commit/e28e705d17438ffd */
    +       if (keyp & 0xFF00 && keyp <= USHRT_MAX) {
    +           keyp ~= 0xFF00;
    +       }
    +#endif
    

    @arn79 if you do this and get the expected result on a non-wide, non-utf8 build, then I'll integrate this as a workaround of a PDCurses bug (note: this should not be necessary with PDCursesMod 4.3+).

    Note: PDCursesMod build with UTF8 option does want to return UTF8 only; and most extended characters have a value that do not fit into a single char(which it did with an appropriate encoding) - in this case the "get single character" part in sceenio.c cannot work and therefore a "beep" instead of an input is expected there. That is a real bug which is now tracked at [bugs:#852].

    Note: It seems to also work fine with the wingui port of PDCursesMod, where I get the expected values back for all tested characters but the EURO sign €; it seems that it always returns the decimal value of UTF-16 /which kindly matches ISO-8859-15 for european character sets) [even in UTF-8 builds].

    So what would I suggest:

    • try with wincon builds that use PDCursesMod 4.3+ (which incorporated a change from PDCurses after I've first complained about the strange prefix) as not-utf8 not-wide (I'd have to recheck this)
    • try with wingui (which ignores chcp completely), wide or non-wide, seems to work
    • use ncursesw with appropriate LANG setting, either on Win32 or on GNU/Linux, seems to work best "in general" (with the exception of GnuCOBOL not knowing how to get the "logical" position correctly and likely handling a backspace on screen accepts of multi-byte characters like the EURO sign problematic - I'd have to recheck this)

    While this bug is closed as not-out-bug we can still use it for discussion of this issue and - if there are - possible workarounds (like the one above).

     

    Related

    Bugs: #852

  • Arnold Trembley

    Arnold Trembley - 2022-09-11

    Back in July, 2021, before reporting this bug, I built a special version of GnuCOBOL 3.1.2 for Christer Karlsson to test with:

    cobc (GnuCOBOL) 3.1.2.0
    libcob (GnuCOBOL) 3.1.2.0
    Built Jul 05 2021 22:53:08
    Packaged Dec 23 2020 12:04:58 UTC
    C version (MinGW) "6.3.0"
    indexed file handler : BDB, version 18.1.40
    mathematical library : GMP, version 6.2.0
    extended screen I/O : pdcurses, version 4.2.3 (CHTYPE=64, WIDE=0, UTF8=0)

    PDCursesMod
    make -f Makefile INFOEX=N CHTYPE_64=Y DLL=Y

    So, WIDE was defaulted to 0, but does that mean WIDE=Y or WIDE=N?

    As I understand it, the PDCursesMod 4.2.3 built for that version had the fix for the sign-extension problem described in:
    https://github.com/wmcbrine/PDCurses/commit/e28e705d17438ffd
    Which I understand was ported from Bill Gray's PDCurses to WMcBrine's PDCursesMod.

    Christer Karlsson reported that he was able to display and enter the extra Swedish characters EXCEPT when using COBOL SCREEN SECTION. There he could DISPLAY the extra Swedish characters, but could NOT enter them.

    I am not sure of the exact meaning for the PDCursesMod makefile parameters, and if those parameters were bad choices. Should INFOEX=Y be used instead? Should WIDE=Y be specified? I believe UTF8=Y should NOT be specified, but I am not certain .

    As for solutions, I think the code change you mentioned for screenio.c would probably NOT be needed, since all my recent and future builds of GnuCOBOL should be using PDCursesMod 4.3.3 or higher (currently 4.3.4).

    I can ask Christer Karlson to try out the latest GnuCOBOL 3.1.2 builds with PDCursesMod 4.3.4, and try both WinCON and WinGUI.

    Please let me know if it is worthwhile to try patching screenio.c or building ncurses.

    I will also try to watch for any fixes to bug #852.

    Thanks!

     
    • Simon Sobisch

      Simon Sobisch - 2022-09-11

      The default in the Makefile is N for WIDE and UTF8 and GnuCOBOL does that with 0 and 1, so your build was fine.
      But it did not have the referenced fix in, this was included since PDCursesMod 4.3.
      The mentioned change for screenio.c would be in for builds with plain PDCurses (which still has not had a release since that change) and for builds like your old. For testing it you'd therefore need an old version.

      As mentioned: I think most characters should now be able to be ACCEPTed correctly by only using an updated version of PDCursesMod and wincon (or using wingui). So yes: asking for a treats with these would be useful.

      From my latest tests I'm led to believe that the UTF8 versions of PDCurses* are currently broken on Win32.

      A build that uses ncursesw would be very useful in any case, and possibly works better in many places.

       
      • Arnold Trembley

        Arnold Trembley - 2022-09-12

        Simon,
        Thanks very much!  That information is very reassuring, although I may rebuild my environment to drop the INFOEX=N argument completely, since I expect to always use GCC 9.2.0 or higher from now on.
        I have not had any response yet from Christer Karlsson on bug #751, so I may have to try testing his sample programs to see if I can find any improvement in results. 

        I will probably do at least one test build of GC 3.2 (but not for distribution) before the Release Candidate, so I can try "make distmingw".  I have been using a .cmd script to copy the needed components into the binary folder, although "set_env" variations come from a manually edited folder.   

        I'm looking forward to GC 3.2 RC1 and PKGBUILD!
        What kind of warning message were you seeing with 7-Zip?  I never noticed any in my use of 7-Zip.
        Thanks again for all your help!
        Kind regards,
        Arnold

        https://www.arnoldtrembley.com/

        On Sunday, September 11, 2022 at 02:46:29 AM CDT, Simon Sobisch <sf-mensch@users.sourceforge.net> wrote:
        

        The default in the Makefile is N for WIDE and UTF8 and GnuCOBOL does that with 0 and 1, so your build was fine.
        But it did not have the referenced fix in, this was included since PDCursesMod 4.3.
        The mentioned change for screenio.c would be in for builds with plain PDCurses (which still has not had a release since that change) and for builds like your old. For testing it you'd therefore need an old version.

        As mentioned: I think most characters should now be able to be ACCEPTed correctly by only using an updated version of PDCursesMod and wincon (or using wingui). So yes: asking for a treats with these would be useful.

        From my latest tests I'm led to believe that the UTF8 versions of PDCurses* are currently broken on Win32.

        A build that uses ncursesw would be very useful in any case, and possibly works better in many places.

         
  • Chuck Haatvedt

    Chuck Haatvedt - 2022-09-13

    I believe that INFOEX may be related to the Windows Console implementation of extended information functions which were implemented with Windows Vista and later versions.

    If I'm correct and PDCursesMOD were build with INFOEX=Y and it was run on Windows XP or Windows 2000, it would fail.

    you could do some testing of this if you have a machine running Windows XP

    Also if you check the code in pdcscrn.c in the PDCursesMOD wincon directory you will see references to this...

    static int _set_colors(void)
    {
        SetConsoleTextAttribute(pdc_con_out, 7);
        _reset_old_colors();
    
        if (pSetConsoleScreenBufferInfoEx)
            return _set_console_infoex();
        else
        {
            _set_console_info();
            return OK;
        }
    }
    
     
    • Simon Sobisch

      Simon Sobisch - 2022-09-13

      I believe that INFOEX may be related to the Windows Console implementation of extended information functions which were implemented with Windows Vista and later versions.
      If I'm correct and PDCursesMOD were build with INFOEX=Y and it was run on Windows XP or Windows 2000, it would fail.

      In this case not: that option is around since XP times, it is there to define the INFOEX structure in case the used windows header files don't distribute it, which was the case for at least old MinGW (not sure if updating it also updates its windows headers).
      The structure is used in any case, it is just the question about how its definition get to the C preprocessor.

      Either the build fails - then add INFOEX=N, or it doesn't, then leave it away.

       
  • Simon Sobisch

    Simon Sobisch - 2022-11-06

    I've took an hour to test current PDCursesMod changes "outside of GnuCOBOL" https://github.com/Bill-Gray/PDCursesMod/issues/245 and can say that a build that does not use WIDE/UTF8 when building PDCursesMod everything works as expected with a bunch of windows codepages I've tested with (as long as you don't use utf8 codepage, likely also utf7 codepage); I've also tested chcp 850, chcp 1252 and different ISO 8859 variants see codepage identifiers).

    For now with GnuCOBOL only non-wide builds of PDCursesMod will work (PDCurses misses fixes for extend characters) as for now there is a hard-limit to one-byte only); for both Notepad++ and Windows cmd (if only GnuCOBOL is running in there) it is likely most reasonable to use ISO 8859-15 / chcp 28605 for West-European display/accept.

    For PDCursesMod both 32bit and 64bit chtypes are fine; the later providing better color and attribute support.

    For ncursesw this all depends mostly on LANG, not on chcp; in this case also UTF8 works (but will return two or three characters with some input like the Euro sign) because when requesting a single byte and the input is a multi-byte character ncurses will return those as multiple split inputs (as if multiple keys have been pressed). Displaying those "parts" back later again will then display the character entered, too.

    @arn79: if not already done that way, I suggest to have PCursesMod for your GnuCOBOL packages be built as "default" (non-wide, non-utf8, 64bit chtype) [of course as DLL, as this allows also to switch between wincon, wingui and vt at any time]; and, as PCursesMod seems to be on the process of a new release "very soon" rebuilt that as soon as that is available. This will also fix an abort on screenio exit because the "free all memory" function that was added in the last release, which si broken, was completely removed [and therfore won't be found during GnuCOBOL's configure and not be used]).

     

    Last edit: Simon Sobisch 2022-11-06

Log in to post a comment.

MongoDB Logo MongoDB