Re: [sdcc-devel] Using libunistring in SDCC?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I also like "the sensible approach". Processing of the sources
without non-ASCII identifiers shouldn't "cost more" than before.

Also, an optional compilation and use of that library would be
great!

----

>From a perspective of a user, I still don't understand what are
the practical impacts of SDCC doing the normalization:

a) A comment or a string:

    /* Napisao J. Stamenović */

    "Schwenkschiebetüren";

As discussed, the compiler doesn't have to do anything.

b) A math constant:

    float π = 3.14159265;
    float area = 2*π*r;

In practice, the programmer won't have any benefit from a compiler
which checks if it is a normalized form.

c) Using the Unicode characters with the same appearance:

    int y = 33; /* ASCII letter */

    int coordinate = у * 44; /* Cyrillic Small Letter U */

This way of using Unicode characters can cause a confusion, as the
Cyrillic у looks the same but it has a different code. Still,
that у would be already normalized, so no checks "is it
normalized" would help a programmer! If some #include file was
used before that defined a Cyrillic variable, the second line
would compile and resolve to that different variable, without the
include, we already, in the current SDCC, get:

    error 20: Undefined identifier 'у'

d) Expecting the auto normalization to "work"

One or more programmers could enter differently "the same thing"
as an identifier which they expect to normalize

    Å = 33; /* U+212B the angstrom sign */

    Å = 33 /* U+00C5 letter */

There could be some benefit of detecting that the first is not in
a normalized form, but it doesn't protect anybody from the
problems like in c)

And I don't even know if Unicode considers that U+212B "should or
shouldn't be used in an identifier". It doesn't help and doesn't
change anything, also not "for security" because if any script
which could "look too much the same" whould be banned from
identifiers, it would be a huge discrimination.

In short, IMO, normalization should be an answer to a specific
problem, not something done "just because". Specifically, if the
problem is some security evaluation ("does something look the same
as something else?"), such a tool should be used and its outputs
analyzed independently of a compiler, because different scripts
could look the same anywaz, and, on another side, the security
breaches could very well be implemented in pure ASCII too and can
be missed in the visual control of the sources, so expecting a
compiler that solves all the problems that could ever happen isn't
realistic.

My current impression is still that an existence of automatic
normalization and automatic detection of non-normalized forms in a
compiler won't change anything in practice, except that a checkbox
could be ticked "yes we have that"?

--- Ursprüngliche Nachricht ---
Von: Benedikt Freisen via sdcc-devel <sdc...@li...>
Datum: 22.10.2025 20:33:49
An: sdc...@li...
Betreff: Re: [sdcc-devel] Using libunistring in SDCC?

Am 22.10.25 um 20:31 schrieb Philipp Klaus Krause:
> Am 22.10.25 um 20:16 schrieb Benedikt Freisen via sdcc-devel:
>>
>> In this case, I would like to suggest to specify that SDCC requires

>> that all identifiers are provided in normalization form C. SDCC
would 
>> then likely already supports all valid identifiers (and then some)

>> and we can then use a 3rd party library to generate the diagnostic

>> messages the standard demands. This library could then even be made

>> optional.
>
> But if we have the external library for the check for normalization,

> we could also just normalize, which IMO would be more user-friendly.

>
> How about the following?
>
> * The dependency on the unicode library becomes a configure option

> (default to on), or it just gets used if found at configure time (like

> we already do for treedec).
>
> * If the library is not present: check identifier if there are any

> non-ASCII characters in it. If one is found, emit a warning, stating

> that SDCC built without the library has incomplete support for 
> non-ASCII identifiers.
>
> * If the library is present: normalize identifiers to normalization

> form C, and do the full check for XID_Start/XID_Continue.
>
> Philipp
That sounds like a sensible approach.

_______________________________________________
sdcc-devel mailing list
sdc...@li...
https://lists.sourceforge.net/lists/listinfo/sdcc-devel

Re: [sdcc-devel] Using libunistring in SDCC?

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Re: [sdcc-devel] Using libunistring in SDCC?