GnuCOBOL / Discussion / Help getting started: Screen section - UTF-8 locale

Anonymous - 2020-05-21

I wrote a program using the SCREEN SECTION.

The program, compiled with both GNUCOBOL 2.2 (and later also with GNUCOBOL 3.0-rc1.0) works fine except for one runtime issue:
**I can't get the input screen to accept the characters é § è ç à ö.
**
The input screen always shows 2 blanks spots from where I started writing one of the former character;
I use Ubuntu 18.04.LTS on a laptop with a fr_BE keyboard, but the same issue happens on a Google Cloud Engine virtual machine with Ubuntu 20.04.LTS and GNUCOBOL 2.2

I have been looking for a solution on different websites but found no solution:
* locale is en_US.UTF-8 - changing to fr_BE.UTF-8 did not change anything
* using the LOCALE statements from the manual does not change a thing
* using the drawbox.cob example, I have to define the X fields to x(2) to make the characters visible after ACCEPT/DISPLAY (not screen def) - while using the SCREEN SECTION the characters are again replaced by 2 blank characters

It seems to be an double byte character issue, but I don't find a solution.

Can anyone help me with this ?

Kind regards,

J.M.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Simon Sobisch - 2020-05-21

When your terminal adds two characters instead of one there this means it really uses UTF-8, as set up but don't display it accordingly. I do think you use ncursesw (cobcrun --info will tell you something about this), correct?
As you likely use PIC X and if you specify you want 50 bytes you also want to be able to use 50 character positions you may simply fall back to ISO-88591, which likely should be the case if you export LANG=fr_BE before running the program.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-21

FYI: programs addes as attachement (cobc info and other information in source code comments)

Program test only uses DISPLAY/ACCEPT without SCREEN SECTION.

Program test2 only uses DISPLAY/ACCEPT with as defined in SCREEN SECTION.
In test 2, characters like é § è ç à ö are not accepted as input and provoke a blank screen entry.

How to remediate this ?

Test

Test2

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-21

@Simon:

Thanks for your reply - I just added 2 test programs and yes, ncursesw is used.

ncursesw was installed when I installed GNUCOBOL 3.0-rc1 from the website, following the procedured described.

On the other hand, on my Google Cloud Engine virtual machine GNUCOBOL 2.2.0 was installed via sudo apt install gnucobol and also uses ncursesw.

How can I install with ncurses instead of ncursesw, if that is the issue ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-21

Changing the locale to fr_BE does not change the situation.

$ export LANG=fr_BE

$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=fr_BE
LANGUAGE=en_US
LC_CTYPE="fr_BE"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="fr_BE"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="fr_BE"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-21

With program test (without SCREEN SECTION) and LANG=en_US-UTF-8 I still can input 2 characters, even é§ is OK.
Changing the locale doesn't seem to have any effect.

Is this related to ncursesw ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-22

Did not find this to be related to the terminal application or the font used in the terminal either.
I use LXTerminal on my Ubuntu 18.04 LTS / GNUCOBOL 3.0-rc1 physical machine
and I use SSH via https://ssh.cloud.google.com to access my GCE virtual machine with Ubuntu 20.04 LTS / GNUCOBOL 2.2.0.
Both don't accept the characters é § è ç à ö via a screen defined in SCREEN SECTION.
Also: same test programs on both machines (cfr. message above with programs attached).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-23

No success yet.
Uninstalled 3.0-rc1 and installed GnuCOBOL 2.2-disco: same issue.
Changing locales via "export LANG=fr_BE.iso885915@euro" and others did not work.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-23

Now also unistalled GnuCOBOL 2.2-disco.

Installed open-cobol1 via "sudo apt install open-cobol": issue still present.
Only difference: the terminal window becomes white w/ black character and é§èçàö are represented by single character ? instead of double character blank.

Giving up at this time.

I have been working with GnuCOBOL 2.x since a couple of years without issues, on Ubuntu and on MS-Windows.
I suppose this issue must be linked to (locales at) Ubuntu (both 18.04 LTS on physical machine and 20.04 LTS on Google Cloud Engine).
Weird !

Any help welcome - will check for answers on a regular base.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-24

**UPDATE - May 24, 2020
**
I installed an UBUNTU 20.04 LTS machine from scratch and installed GnuCOBOL 2.2.

The issue described above is still present.

**The issue is only present when;
**
* using DISPLAY or ACCEPT together with "LINE ll COL cc" or "AT lllccc" (line/col in numeric format)
* using the SCREEN SECTION definition (which of course use LINE/COL too)

GnuCOBOL install and cobc display no error/warning messages.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-27

**UPDATE - May 27, 2020
**
Just installed GnuCOBOL 3.1-dev.0 on a MS-Windows 10 Pro (English) machine with a Belgian keyboard (cobc --info attached). Installation executed via Arnold Trembley's latest build environment (version 16MAY2020).

Compiling with cobc does not indicate any error or warning during compilation, but the issue with (French) accented characters like é § è ç à ö is also reproduced on this installation.
In terminal (Command Prompt) the accented characters display well. As soon as the GnuCOBOL application is run, the accented characters display 2 blanks (and a 'beep') if the program uses DISPLAY/ACCEPT with LINE/COL or if the program uses SCREEN SECTION to accept user input.

So, to summarize: I face the same issue on 5 different machines and a lot of trial and error with Linux locale:
* laptop with Ubuntu 18.04 LTS (tried with GnuCOBOL 2.2 and 3.0-rc1)
* laptop with Ubuntu 20.04 LTS (tried with GnuCOBOL 2.2 and 3.0-rc1)
* laptop with MS-Windows 10 Pro (tried with GnuCOBOL 3.1-dev.0))

Any suggestions how to solve this issue ?

win10pro_cobc--info.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Arnold Trembley - 2020-05-28
  
  I don't know if there is a workable solution. I downloaded the latest PDCurses 4.1.99 from
  https://github.com/Bill-Gray/PDCurses
  which was updated about 5 hours ago.
  
  I built PDCurses 4.1.99 with
  make -f Makefile.mng INFOEX=N DLL=Y WIDE=Y UTF8=Y
  and with GnuCOBOL 3.1-dev r3580, and then ran the modified zztest2.cob program.
  That one displays the requested characters with row column, but the box drawing characters are corrupted.
  
  I then built PDCurses 4.1.99 with
  make -f Makefile.mng INFOEX=N DLL=Y (note: no UTF8 support!)
  and GnuCOBOL 3.1-dev r3580, and ran zztest2.cob
  
  This time the box drawing characters were correct by the requested special characters were wrong.
  
  Both those builds report PDCurses in cobcrun --info as follows:
  extended screen I/O : pdcurses, version 4.1.99 (CHTYPE=64, WIDE=0)
  mouse support : yes
  in other words, they don't say whether or not UTF8 support is present, even though it obviously changes the test results.
  
  I did not attempt to build PDCurses 4.1.99 with CHTYPE32=Y, because David Wall's testing suggests that doesn't work, nor did I try building PDCurses 4.1.99 with WinGUI instead of WinCon, also because David Wall's testing suggests that won't work either. It's possible WinGui would behave differently with the most recent download of PDCurses 4.1.99.
  
  I also didn't test for mouse support, maybe I can get to that tomorrow.
  
  The separate colors.cbl program seems to product the desired results for either build of PDCurses 4.1.99 (or 4.1.1 from an earlier test). But the build of GC31 that you downloaded from my website was built with PDCurses 4.1.0 from last March, and has less desireable results.
  
  PDCurses 4.2.0 is expected soon, but there are still unresolved issues with CHTYPE32, UTF8, and WinGui. See related thread:
  https://sourceforge.net/p/open-cobol/discussion/help/thread/bc0f3d2ea5/
  
  Kind regards,
  
  Last edit: Arnold Trembley 2020-05-28
  
  PDC4199-X1-zztest2.jpg
  
  PDC4199-X2-colors.jpg
  
  PDC4199-X2-zztest2.jpg
  
  colors.cbl
  
  zztest2.cob
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Simon Sobisch - 2020-05-28
    
    The UTF8 builds (I try to get this shown in cobcrun --info) won't work with codepage 437 or similar encoded box drawing characters which you need if you use the standard cmd.exe with the standard locale settings.
    
    If you use the UTF8 versions of the box drawing characters everything should be fine in this builds, check Wikipedia box-drawing_character for a list and their hex values.
    
    Note: this will (obviously) break the option to use same characters for "simple" ACCEPT/DISPLAY, but you seem to be able to fix this by enabling Windows Beta UTF8-Support.
    
    When you use a setup like this (globally enabled UTF-8) + Windows UTF-8 support you can easily change your source to use UTF-8 encoding, too (just keep in mind that some characters will take more than one byte) and then can replace the hex characters by the actual characters if you like to.
    
    I think that the CHTYPE_32 color issue is possibly the result of not building both PCurses and GnuCOBOL with CHTYPE_32 - the recent version of PDCurses 4 (master snaphot 6 hours ago) will not allow this any more (you'll see linking errors), can somebody please check if a "clean" CHTYPE_32 build still has the color issue?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-28

Arnold, Simon,
Thanks for checking this out and for your answers.
This weekend I will try out some options and publish the results here.
I would like to have a clean GnuCOBOL 2.2 (or 3.x) system on Ubuntu 20.04 LTS and on MS-Windows 10 Pro, both working with a 'standard' Belgian keyboard. My OS's are installed as English, but the text I/O needs to be able to handle accented characters on screen (since I'm writing in Dutch, French and German, all characters supported by the BE keyboard).
I also wish to use the SCREEN SECTION (or at least LINE/COL DISPLAY/ACCEPT) in my COBOL programs.
Keep up the good job with GnuCOBOL !
Kind regards,
J.M.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Simon Sobisch - 2020-05-28
  
  In this case I'd try to go full UTF-8.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-28

What do I have to do/install to go full UTF-8 ?
I will also have to check how to compile from source with PDCurses (my current build is ncursesw).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Vincent (Bryan) Coen - 2020-05-28

Go into settings laguage and/or locale it should be in one of them.
For windows you will need to use settings and system subject to what version you use as I am typing this under Linux and my Win laptop is shut down.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-29

Vincent,
Thank for your reply.
My standard installation (even a clean install of Ubuntu 20.04 LTS on Google Cloud Platform) has en_US.UTF-8 as the default locale setting.
I tried different combinations of locale settings (en, fr, ge, da / US, FR, BE, DE, DK / UTF-8 and non-UTF-8 like fr_BE.iso88591) this last week but the issue persists.

I have the impression that the issue is due to something else:
The issue only appears when using DISPLAY/ACCEPT with LINE/COL or when using a screen definition via SCREEN SECTION (which uses LINE/COL too of course).
In case of DISPLAY/ACCEPT without LINE/COL the accented characters are displayed well. Only issue here is that accented character use two bytes of the defined variable, which shifts the characters to the left and so doesn't permit to create a useful fixed position screen display.

This behaviour even cuts of the last inputted character(s), e.g.
02 TXT-IN PIC X(6).
-> abcdef => abcdef
-> abëdef => abëde (the variable contains 6 bytes)
-> aböçef => aböç (the variable contains 6 bytes)

I will write a program to show this behaviour together with some screenshots and publish here shortly.

I'm thinking about the issue being provoked by other parameters than locale, maybe differences in "curses" behaviour (at this time GnuCOBOL always installed with ncursesw, so maybe I could try compiling from source with the latest version of PDCurses - have to check how to do that).

Anyway, if I find a solution, I will publish here.

Any help still welcome :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous - 2020-05-29

I hope this small test program brings some light into the darkness - cfr. attachment.
I joined 2 screenshots from program execution - cfr. attachments.
ACCEPT/DISPLAY without LINE/COL shows accented characters but uses 2 bytes per accented character.
ACCEPT/DISPLAY with LINE/COL does not show the accented characters but shows 2 blanks.

TEST_1A_1B.jpg

TEST_2.jpg

accents.cob

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Anonymous - 2024-01-10
  
  Hi all,
  
  I promised to publish when I found a solution.
  Tried something today... and it works !
  After almost 4 years.
  Compiled with GnuCOBOL 3.2.0
  
  Solution is Yannick Vanhaeren's en_BE locale file described in
  https://gist.github.com/yvh/630368018d7c683aca8da9e2baf7bfb9
  
  J.M. Lietaer
  
  **Relevant part of the source code : **
  
  000025 SOURCE-COMPUTER.
  000026 UBUNTU_22_04_LTS.
  000027 OBJECT-COMPUTER.
  000028 ANY-PLATFORM
  000029 CLASSIFICATION belgian.
  000030 SPECIAL-NAMES.
  000031 LOCALE belgian "en_BE.UTF-8".
  000032 *
  000033 * Set locale to en_BE.UTF-8
  000034 * Cfr. https://gist.github.com/yvh/630368018d7c683aca8da9e2baf7bfb9
  000035 * sudo cp en_BE /usr/share/i18n/locales/en_BE
  000036 * sudo localedef -i en_BE -c -f UTF-8 en_BE
  000037 * echo "en_BE.UTF-8 UTF-8" | sudo tee -a /etc/locale.gen
  000038 * sudo locale-gen
  000039 *
  000040 * Maybe also change files in /var/lib/locales/supported.d/
  000041 *
  000042 * See also : https://www.server-world.info/en/note?os=Ubuntu_
  000043 *
  
  Locale settings
  
  $ localectl
  System Locale: LANG=en_BE.UTF-8
  LANGUAGE=fr_BE:fr_FR
  LC_NUMERIC=en_US.UTF-8
  LC_TIME=en_US.UTF-8
  LC_MONETARY=en_US.UTF-8
  LC_PAPER=en_US.UTF-8
  LC_NAME=en_US.UTF-8
  LC_ADDRESS=en_US.UTF-8
  LC_TELEPHONE=en_US.UTF-8
  LC_MEASUREMENT=en_US.UTF-8
  LC_IDENTIFICATION=en_US.UTF-8
  VC Keymap: n/a
  X11 Layout: be
  X11 Model: pc105
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Simon Sobisch - 2024-01-10
    
    And the result is?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
    - Anonymous - 2024-01-10
      
      The result...
      The locale file en_BE is not installed by default in Ubuntu (and other Linux distributions?).
      By installing this locale, the characters passing thru the SCREEN SECTION as input and output are now displayed correctly.
      Keyboard is AZERTY (fr_BE).
      OS language is English (en_US).
      Attachments - both inputs are the same &é"'(§è!çà)-ôöùµ[]{}
      en_BE_1.jpg : with en_BE
      en_BE_2.jpg : without en_BE
      
      👍
      1
      
      en_BE_1.jpg
      
      en_BE_2.jpg
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      
      Anonymous
      
      Add attachments
      Cancel
      You seem to have CSS turned off. Please don't fill out this field.
      
      You seem to have CSS turned off. Please don't fill out this field.
      - Simon Sobisch - 2024-01-10
        
        Note that this likely still has the "issue" that because it is UTF-8, each of those will be counted as 2 bytes and also stored that way.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Anonymous - 2024-01-10
        
        I'm quite sure the characters are still stored as 2 bytes, but the difference is that both the input and the output field from the SCREEN SECTION are now working correctly.
        The buffer zone after the input field is now empty, where before part of/entire characters/bytes were found. The en_BE locale seems to do the trick.
        I'm ready to use GnuCOBOL as a programming language again :)
        Best wishes to the whole team and keep up the excellent work on COBOL !
        
        👍
        1
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.
        
        Juan Carlos Escartí - 2024-01-29
        
        Hello everyone, after some time updating the S.O. where are my applications, I have migrated the operation from Kernel 3.4.63 to 5.14.21. Suse 12.2 to SuSe Leap 15.5
        I'm going to see if we put GNU Cobol into operation.
        How can this solution be generalized to all languages?
        My experience:
        With the old S.O. A 32-bit Cobol 3.1 version worked for me with CP850
        In all new versions, the 2 null characters appear in both CP850 and UTF-8
        Thank you
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Anonymous
        
        Add attachments
        Cancel
        You seem to have CSS turned off. Please don't fill out this field.
        
        You seem to have CSS turned off. Please don't fill out this field.