Re: [sdcc-devel] Using libunistring in SDCC?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Am 23.10.25 um 09:12 schrieb "Janko Stamenović" via sdcc-devel:
> 
> And I don't even know if Unicode considers that U+212B "should or
> shouldn't be used in an identifier". It doesn't help and doesn't
> change anything, also not "for security" because if any script
> which could "look too much the same" whould be banned from
> identifiers, it would be a huge discrimination.
> 
> In short, IMO, normalization should be an answer to a specific
> problem, not something done "just because". Specifically, if the
> problem is some security evaluation ("does something look the same
> as something else?"), such a tool should be used and its outputs
> analyzed independently of a compiler, because different scripts
> could look the same anywaz, and, on another side, the security
> breaches could very well be implemented in pure ASCII too and can
> be missed in the visual control of the sources, so expecting a
> compiler that solves all the problems that could ever happen isn't
> realistic.
> 
> My current impression is still that an existence of automatic
> normalization and automatic detection of non-normalized forms in a
> compiler won't change anything in practice, except that a checkbox
> could be ticked "yes we have that"?
>
* The standard says that two identifiers are the same if they are the 
same after normalization to normalization form C.

* Someone could e.g. have a Variable named Übernachtungspauschale 
written with U (U+0055 LATIN CAPITAL LETTER U) + ̈ (U+0308 COMBINING 
DIAERESIS), because their text editor does it like that, then someone 
works on the source with another text editor that uses Ü(U+00DC LATIN 
CAPITAL LETTER U WITH DIAERESIS), and it would be confusing to the user 
if those were not treated as the same by SDCC due to lack of normalization.

* Well, I don't know either if UAX #31 says that U+212B can be used in 
an identifier or not. So I'D want SDCC to warn me if I try to use it, 
despite it not being allowed in identifiers. My code should be portable 
unless I intentionally use a compiler-specific feature.

* Regarding the security implication of unicode (e.g. homoglyph 
attacks); Having the normalization and the checks for valid identifiers 
does help here. C23 is safer than C11 was (which AFAIK allowed more 
unicode in identifiers).
But for a full solution we'd have to do more (N2932, rejected for C2y, 
but WG14 wanted it as TS, which so far didn't happen), but AFAIK, 
currently libraries that implement everything we'd want for security are 
not that widespread (they exist, in particular libu8ident, but I don't 
think many distros package them).

Philipp

Re: [sdcc-devel] Using libunistring in SDCC?

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Re: [sdcc-devel] Using libunistring in SDCC?