[sdcc-devel] Using libunistring in SDCC?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

IMO, dealing with Unicode is quite complicated, and not something I want 
to go too deeply into. After all, we are building a compiler, not some 
Unicode tool. But to build a compiler, these days requires some Unicode 
functionality. In particular well-formedness checks (we have them), 
normalization (we don't have that), checking of properties (we don't 
have that, except for the trivial stuff).

In particular, an identifier in C23 is something that starts with a 
character with the XID_Start property or '_' (or maybe '$'), followed by 
any number of characters with the XID_Continue property (or maybe '$'). 
Two identifiers are equivalent (ignoring the details about significant 
characters) if their identifiers are equal in Unicode normalization form 
C (which is defined as Unicode decomposition followed by Unicode 
composition). The details for all this keep changing with Unicode 
standard updates.

I don't want to implement or maintain those utilities. So I suggest we 
use an existing library. Due to its wide availability (it is not just 
part of typical GNU/Linux distributions, but also available as msys2 
package for mingw, packaged for OpenBSD, FreeBSD, etc), I suggest using 
GNU libunistring.

Philipp

[sdcc-devel] Using libunistring in SDCC?

The Small Device C Compiler (SDCC), targeting 8-bit architectures

[sdcc-devel] Using libunistring in SDCC?