Menu

#852 (extended) ACCEPT may not accept multibyte utf-8

unclassified
accepted
4
2022-09-10
2022-09-10
No

GnuCOBOL (libcob/screenio.c) accepts a single character (an int) from the underlying curses library via getch (note: at least for ncurses and PDCurses - as soon as you have a "wide" version this is auto-adjusted to wgetch).
It then does some necessary conversions / handles function keys to then either ignore the input as error - if it doesn't match into the storage (which is of type unsigned char *).

Some implementations (at least from my tests: ncursesw on ubuntu with LANG=DE_de.UTF8) seem to "buffer" multi-byte sequences: an ä returns 195 on first call, then on the second call without any keypress 164 - together the correct x'C3A4'- as both are returned separate they are checked against "< 255" and stored separately as two characters into the data (and the expected logical position on the screen is also "two positions changed").
This likely creates an issue when there is not enough place (test open, but likely if we are on the last place we get a x'C3', store it a is and then beep and/or overwrite with x'A4').

Other implementations (for example some PDCurses ports built for UTF8) return one big decimal value (which should be 50084 in this case), which cannot be stored in a "single" position and is therefore not stored and error handling (a beep) occurs.

It seems to be reasonable to split the separate bytes and adjust the "logical" position as it is actually happening with ncurses in the pdcurses case - but of course with checking up front if there's enough space in total.

Related

Bugs: #751

Discussion


Log in to post a comment.

MongoDB Logo MongoDB