Menu

Screen section - UTF-8 locale - ACCEPT does not show é § è ç à ö

Anonymous
2020-05-21
2025-04-27
<< < 1 2 3 > >> (Page 2 of 3)
  • Simon Sobisch

    Simon Sobisch - 2024-01-29

    I'm not sure what the issue you see is, for "Anonymous" above it was fine in both cases, as long as there was a matching locale installed.

     
  • Brian Tiffin

    Brian Tiffin - 2020-05-29

    Maybe part of it. The display part. Keying in is not properly echoed here, yet.

           environment division.
           configuration section.
           source-computer. gnulinux.
           object-computer. gnulinux
               classification is canadian.
    
           special-names.
               locale canadian is "en_CA.UTF-8".
    

    You'd want your own classification name, and proper locale.

    I run without SMCUP/RMCUP, and curses does not get to use a shadow screen, so this capture is just from a normal console

    prompt$ cobc -xj accents.cob 
    
    TEST 1 A - DISPLAY
    
    123456789|123456789|123456789|
    123456789|123456789|12345678ab
    123456789|123456789|12345678ñ
    123456789|123456789|123456789�
    
    TEST 1 B - ACCEPT/DISPLAY
    
    Type <something w/ accents><enter> to continue:
    123456789|123456789|123456789|
    é § è ç à ö
    é § è ç à ö             
    
    
    
    
    
        TEST 2 - NOW WITH LINE/COL
    
        Type <something with accents><enter> to continue:
        123456789|123456789|123456789|
    
        é § è ç à ö
    

    Displays properly, but did not echo on input in TEST 2. That isn't grand.

    Without the extra lines in CONFIGURATION SECTION, with CLASSIFICATION and SPECIAL-NAMES, TEST 2 does NOT display the characters here, along with not echoing on input.

    On the other testing.

    The initial ruler line display tests are semi-reasonable given that 31 bytes of data moved to a 30 character field.

    Added some lines, where you have the MOVE "thing" TO SCR001, DISPLAY SCR001, I cut'n'pasted the MOVE line to a DISPLAY of the literal. This all works. TEST 1 is "abnormal behaviour", only in that when an encoded string is truncated, information is lost, and it becomes undefined behaviour.

    libcob handles that pretty well, it looks like. Someone could probably find some unicode that truncates just right to cause terminal wonkiness. In test 1 those 30 character displays are all actually 30 character displays (in bytes), exactly.

    You could get away with PIC N for that, but PIC N is pretty much UCS-2 and you'd need to have the literals encoded in UCS-2 for everything to work out properly with PIC N.

    TEST 1 A - DISPLAY
    
    123456789|123456789|123456789|
    
    123456789|123456789|12345678ab
    123456789|123456789|12345678abñ
    
    123456789|123456789|12345678ñ
    123456789|123456789|12345678ñab
    
    123456789|123456789|123456789�
    123456789|123456789|123456789ñ
    

    Those are showing the difference in truncated encodings. First set, the ab's lines up, b filling the 30th position, the ñ is never moved to SCR001, but shows up in display of the literal. The second set, the ñ takes up the two bytes filling position 29 and 30, the ab is not in SCR001. The last one is showing the first byte of the ñ which is encoded as

    prompt$ echo -n 'ñ' | xxd
    00000000: c3b1
    

    On this terminal, C3 is a question mark in the monospace font. The B1 is never stored in SCR001. Other consoles will show whatever they normally show for character code 195 in that question mark capture.

    More mystery is the no echo on input for TEST 2. libcob might be missing a step in kicking the LOCALE calls for keyboarding. Not sure yet.

    Hope that helps get you one step closer. CLASSIFICATION and SPECIAL-NAMES with a locale setting.

    BUG in the FAQ; says CLASSIFICATION is unsupported - that's very stale news, will fix.

    https://open-cobol.sourceforge.io/faq/index.html#locale

    Have good, make well,
    Blue

     

    Last edit: Brian Tiffin 2020-05-29
  • Anonymous

    Anonymous - 2020-05-29

    Thanks Blue.
    Great analysis. I will check out the proposed solution and post the result here.
    My current locale is all en_US.UTF-8. I can also try again with the other locales I tried out before and use the appropriate CLASSIFICATION and SPECIAL-NAMES (fr_BE.utf8 as indicated by the command locale will be my first choice since I use a 'standard' Belgian keyboard on all machines).
    Will check on all machines (Ubuntu 20.04 LTS on physical, same OS on Google Cloud VM Instance and Win10Pro on another physical machine).
    Kind regards,
    J.M.

     
  • Anonymous

    Anonymous - 2020-05-29

    Just checked your proposed solution on my physical Ubuntu machine.
    Will check on the VM (no worries about that) and also on my Win10 machine.

    I tried both situations with the command "export ...=..." (en_US.UTF-8 is my default locale):
    LC_ALL=en_US.UTF-8
    LANG=en_US.UTF-8
    LANGUAGE=en_US.UTF-8
    and
    LC_ALL=fr_BE.UTF-8
    LANG=fr_BE.UTF-8
    LANGUAGE=fr_BE.UTF-8

    Compiled with the corresponding value in the CONFIGURATION SECTION as you mentioned above, changing "canadian" to "belgian" :)
    and checked the result.
    Both gave an identical result (independent of locale settings), but better than before - cfr. attachments.

    For TEST 1 A and TEST 1 B
    Identical result as before - no changes.

    For TEST 2
    The "double blank" which replaces an accented character is now a "single blank".
    But: the variable now contains the correct value (minus 1 character at the end of the string per accented character).

    Conclusion for TEST 2:
    The "mystery is the no echo on input" is still present (with one blank character per accented character in the string, instead of two).
    The resulting value in the variable and the display of the variable is correct, but cut off. This issue can be handled by defining the length of the variable about 5% to 10% longer (depending on the initial length/situation) to hold all ACCEPTed characters - until a proper solution is found :)

    Many thanks for helping me out on this.

    I will write another test program with LINE/COL and SCREEN SECTION to find out if this is viable with accented characters.
    Maybe also check if I can reposition the original input over the ACCEPT field, before confirming the ACCEPT. We'll see...

    Kind regards,

    J.M.

     
  • Anonymous

    Anonymous - 2020-05-31

    As promised, here is my test program about handling input with accented characters, based on the comments made in this thread.

    The proposition from Brian to use CLASSIFICATION in the CONFIGURATION SECTION is necessary to get the result I wanted. But, as mentioned before, there are some issues, like the 'mystery of the no echo on input' while using ACCEPT (without screen definition) and the truncation of inputted characters (accented characters use 2 bytes, so this seems 'normal').
    So I tested a couple of things, especially to handle a correct 'full screen input/output' which was the base of the issue I had with accented characters.

    The following program is currently the best solution I could find until now, for the following reasons:
    -- all input is visible in the screen input field (no blanks, accented characters are displayed) and thus no truncation of characters in the input field itself
    -- recuperation of the 'truncated' characters at the end of the string variable (because of the field length and the double byte accented characters) so that the full input can be used for further processing.

    Remark: the definition of the locale being used seems not to be taken into account. My system is en_US.UTF-8 by default and changing (e.g. fr_BE.UTF-8) the locale on the machine or changing the definition of the locale in the source code doesn't make any difference in the (correct) handling of the input in the SCREEN SECTION.

    The test program defines a SCREEN SECTION with a buffer zone after the INPUT field. This buffer zone contains the 'truncated' characters and can be added after submitting the screen input via string trimming and concatenation. It seems that the 'truncated' characters are in reality stored in memory, located directly after the end of the input variable defined in the SCREEN SECTION.

    The only issues I couldn't resolve until now are:
    -- the input field reacts in a bizarre way when using the insert/delete buttons and changing already inputted text (try it out)
    -- the last character of the input field as defined by PIC X(...) must not be an accented character because this accented character will not be taken into account (why?) for recuperation via the buffer zone.

    If you find a better solution, please don't hesitate to communicate.
    Help always appreciated.

    Attached:
    -- screen shot of input screen
    -- screen shot of output screen
    -- source code

    PS: I did not compile/run the test program on MS-Windows 10. Will try this later.

     
  • Anonymous

    Anonymous - 2021-01-27

    i tried to complie and run the program skreen.cob on Windows 10 (with GCC 311).
    But no extended ASCII-codes are ACCEPTed. (No 8 bit chars like ö, Ö or alt 152 ÿ are valid).
    GnuCOBOL seems to filter the 8th bit of the character (while old interpreters RM-Cobol 74 or MF-Cobol 74 are accepting this extended ASCII-set).

    Does someone knows how to "tweek" the C-code of GnuCOBOL to accept these 8-bit ASCII-characters?

    Kind regards. Loek.

     
    • Simon Sobisch

      Simon Sobisch - 2021-01-27

      There's nothing GnuCOBOL filters here, the data goes in as it goes out. You will see that when using plain ACCEPT/DISPLAY for example. As that is "extended" screenio all goes in/out over the configured curses library (see cobcrun --info to see its details) this is possibly what creates the issues, you may want to change that (and/or the terminal you use, cmd.exe does not support UTF-8, as most of Win32 does not). In any case you should ensure that you not operate on UTF-8 data directly (as it is 1 to 4 bytes, PIC X is 1 byte, PIC N would be two bytes ans is "pending" in GnuCOBOL).
      It looks like you want to use "extended ASCII" (so one-byte with a codepage, like ISO 8859-15 or one of the DOS codepages [those likely won't work on extended screenio]) - so you should ensure that the terminal you use also works with that.

       
      • Anonymous

        Anonymous - 2021-01-28

        thanks. but it turns out to be a "feature" using ACCEPT with LINE and COLUMN. ACCEPT without these arguments is ok.
        Using different builds doesn't solve the problem. It seems to be related to the C-compiler. See testing of J.M. Lietaer on using accents. (other thread)

         
        • Simon Sobisch

          Simon Sobisch - 2021-01-28

          Which thread?

          BTW: Do you mind to register/login? That removes the additional moderation que - something a human has to inspect.

           
          • Anonymous

            Anonymous - 2021-01-28
             
            • Simon Sobisch

              Simon Sobisch - 2021-01-28

              The reference is about Ubuntu and UTF-8 (and cutting of multi-byte characters), the discussion above was about Win32 and not UTF-8, so this is likely something completely different. And The C compiler - as the GnuCOBOL compiler - doesn't mind: it just moves the data between the different library calls, which get us back to... the curses library in use, which could be one of the reasons for me asking for testing with a different one. Possibly...

               
          • Jean Marc Lietaer

            Sorry, I reactivated my account today.
            Please read my name in this post instead of most of the Anonymous messages here.

            So for the solution described in
            https://sourceforge.net/p/gnucobol/discussion/help/thread/fe79679dab/#43bc/9f16

            The simple solution that works with ACCEPT/DISPLAY and SCREEN SECTION is a custom (standardized) locale :
            Yannick Vanhaeren's en_BE locale file described in
            https://gist.github.com/yvh/630368018d7c683aca8da9e2baf7bfb9

            The solution works fine for me on :
            - MS-Windows 10/11
            - Ubuntu 22.xx
            - Tuxedo OS 2 - 22.04
            All Linux (PCs and Google Cloud Platform) on UTF-8.

            Additional info : the same 'issue' was present in e.g. Canadian users with a French keyboard and an OS in English. I suppose the same must be true for other 'mixed' keyboard/language populations.

            Kind regards,
            J.M.

             

            Last edit: Jean Marc Lietaer 2024-07-20
          • Jean Marc Lietaer

             

            Last edit: Jean Marc Lietaer 2024-07-20
  • Juan Carlos Escartí

    To accept fields it makes 2 different calls
    cob_field_accept
    cob_screen_accept
    The 2 calls respond to different function keys, which makes little sense for the same application to have different keys to handle the fields.
    For example if you call with cob_field_accept alt-del it deletes the entire field
    If you call cob_screen_accept, this doesn't happen
    In the same way with cob_screen_accept it fills the field with underscores "_" which does not happen with cob_field_accept
    If you put the locale cob_screen_accept if it accepts Ñ and accents but cob_field_accept does not
    I think screenio.c should be revised and rewritten to handle the user interface correctly.

     
    • Simon Sobisch

      Simon Sobisch - 2022-05-31

      Patches welcome to:

      • Apply the same rules to both field and screen accepts (keep in mind that field access often returns key codes where screen access will more do automatically - but that could also be adjusted depending on a setting)

      • Better support for special characters

      With the latter: keep in mind that different curses implementations (and OS rubbing them) handle some things different, especially when using getch and then getting a wide character.

       
  • Juan Carlos Escartí

    Hi Simon
    I have verified that records that have null values cause erratic behavior
    Do you have any news about this?
    Thank you

     
  • Juan Carlos Escartí

    I found this article about ncursesw
    https://www.roguebasin.com/index.php/Ncursesw
    It explains the difficulties of the library to correctly represent wide or Unicode characters and the solutions respect to a Linux development platform running Debian Linux.
    If it doesn't call setlocale, your program will remain in the 'C' locale, which assumes that the terminal cannot display any characters outside the ASCII set
    Could it be a clue?
    Greetings

     
  • Juan Carlos Escartí

    Hello again
    After a really busy 2024 that has kept me very busy, my 600,000 lines of code are still in production, as are the 8,000,000 C-ISAM records.
    VB-ISAM doesn't seem to be evolving, and I don't think BDD is an option for much of a future. It seems like I'll have to migrate to PG or MariaDB.
    My C-ISAM files are reaching their limits in many ways.
    The SCREEN problems seem to come from the different versions of the CURSES libraries, which seems like a chronic problem that's difficult to solve.
    I was wondering:
    What if we write a JavaScript client that interacts transparently with the COBOL compiler, emulating how CURSES works but on the web ?
    With this, COBOL applications would run transparently on the web.
    I think it's not much more difficult than rewriting the entire screen in C, which is what we seem to need.
    I've also almost finished the transpiler, which greatly normalizes Cobol syntax to facilitate its modernization.
    Regarding handling SQL transparently, such as READ WRITE, I'm almost finished with it for MF.
    In GNU, I need to study because it fails; it goes through the pipe and doesn't work.
    What do you think about this?
    Greetings, everyone.

     
    • Mickey White

      Mickey White - 2025-04-26

      Ralph may be on to something. I don't use the screen section. In my previous job we just used Cobol to do the work and used a bash/Perl to run a CGI script to call an html web page via Apache. We were using Linux. But I accessed the www page on my windows box.

       
  • Ralph Linkletter

    What do you think about this?
    You asked :-)
    The existence of a "Screen Section" does not belong in a COBOL compiler.
    Trying to shoe horn an historical Unix teletype paradigm into a video presentation paradigm does not seem logical to me.

    Expecting COBOL to manage the physical presentation and the state of a dialog seems to me to be way outside the domain of COBOL.

    As far as needing another transpiler - GnuCOBOL already is a transpiler.

    Many a COBOL vendor has implemented "SQL Transparency" .
    The problem with that paradigm is that there is zero normalization.
    No integrity regarding foreign keys.
    Realizing the benefits of "SQL Transparency" require abandonment of integrity and performance.

    Ralph

     

    Last edit: Ralph Linkletter 2025-04-26
  • Juan Carlos Escartí

    All the Cobol compilers I've worked with for over 40 years have solved the problem of screen handling.
    Transparent SQL handling is clearly a "fiction." ISAM and SQL database files have different logic, and you necessarily "must tweak the code."
    Virtually any compiler has a multitude of libraries to solve a multitude of problems.
    From screen management to network management, etc.
    I don't quite understand why this isn't addressed in GNUCOBOL.
    If GNU is never going to handle the screen correctly, its scope will be restricted to "Those who don't need to handle the screen."
    J.C.

     
  • Juan Carlos Escartí

    And by the way, there's a full description of the SCREEN SECTION in the OPEN GROUP's Technical Standard for COBOL Language.
    https://pubs.opengroup.org/onlinepubs/009680799/toc.pdf
    "The existence of a "Screen Section" does not belong in a COBOL compiler.", seems to me to be a statement lacking in rigor, if it is defined in the XOPEN group standard.

     
  • Juan Carlos Escartí

    Resources are precisely the necessary condition for the advancement of any project.
    We must remember that the Linux Foundation spent €300 million on the project last year.
    If we want GNUCOBOL to advance, we cannot ignore the necessary material conditions. Thinking that only altruistic programmers, free of charge, will solve all the problems of a production COBOL compiler is a seemingly impossible dream.
    Seeking alternative resources from stakeholders is, I believe, the only way to have a fully profiled, bug-free, production-ready universal COBOL compiler that can directly replace the most popular compilers such as MF, ACU, RM, etc.

     

    Last edit: Juan Carlos Escartí 2025-04-27
  • Eugenio Di Lorenzo

    Just to be precise.
    The SCREEN SECTION is fully included in the standard ISO/IEC/JTC definition of the COBOL language: Information technology — Programming languages, their environments and system software interfaces — Programming language COBOL.

     

    Last edit: Eugenio Di Lorenzo 2025-04-27
  • Juan Carlos Escartí

    To Ralph
    There may be misunderstandings or differences in our visions, and I'd like to better understand your perspective. I believe open and honest communication can help us align our goals and ensure we're all working together for the good of the project.
    Don't you think the GNUCOBOL compiler should be a direct replacement for the more popular COBOL compilers?

     
<< < 1 2 3 > >> (Page 2 of 3)

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.