[Seed7-users] Using UTF-8 characters in identifiers

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Dear Seed7 Users,

Attached is an experimental patch that I used to enable UTF-8 multibyte characters in program identifiers.
I think it may be useful for educational purposes/schools as an example.
Please let me know if you also see other potential uses of this feature.

Most changes are in the scanner.c file. The identifier symbols continue to be stored as multibyte C strings.
I also added portable wcwidth.c and c_ident.c files from "libutf8" library with reference to the original author.
All makefiles were modified to include these new files. Btw, is it the correct way?
For me this was easiest portable solution to use wcwidth and to quickly check if a unicode character can be part of identifier.

The patch also includes a feature to automatically use utf8 files STD_UTF8_IN and STD_UTF8_OUT for IN and OUT variables.
It introduces new primitive action "UT8_MODE_ON" for that and a new function in ut8lib.c to check if current locale uses UTF-8.
http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate was used as one of the guides.
I checked the changes and ran tests under Ubuntu 14 and Windows 8.
Windows locale detection and console code page selection are not yet implemented.

However one change in utf8.s7i causes the compiler s7c to fail.
The change is in lines 242 - 259 and is commented out.

../lib/utf8.s7i: In function ‘o_3135_SEL_STD_FILE_FROM’:
../lib/utf8.s7i:248:45: error: expected expression before ‘;’ token
     isUTF8 := utf8_mode_on;

Please advice what could be the reason.

Regards and
Merry Christmas,
Arkady Kuleshov

[Seed7-users] Using UTF-8 characters in identifiers

Interpreter and compiler for the Seed7 programming language.

[Seed7-users] Using UTF-8 characters in identifiers