Re: [sdcc-devel] Using libunistring in SDCC?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

"Not allowing if they look the same" is problematic especially as
the "looking the same" is relative.

Would MISRA disallow the use of I (capital i) and l (lowercase L)
in the same C file? They can already look identical depending on
the font selected.

With your change, how would SDCC react if a Cyrillic identifier
exists in a C file where other identifiers are Latin?

Real life examples are always more interesting.

--- Ursprüngliche Nachricht ---
Von: Philipp Klaus Krause <pk...@sp...>
Datum: 23.10.2025 17:06:26
An: sdc...@li...
Betreff: Re: [sdcc-devel] Using libunistring in SDCC?

Am 23.10.25 um 16:00 schrieb "Janko Stamenović" via sdcc-devel:
> I hope SDCC will allow use of Cyrillic identifiers mixed even
> when most are Latin, especially if N2932 is not accepted.
> 
> The "secure" shouldn't mean preventing the use of the script.

N2932 does indeed look quite restrictive in its interpretation of UAX #39.

I hope there will be a more-refined proposal in the future, posibly 
still disallowing the mixing of cyrillic with latin in the same 
identifier, but not disallowing using both cyrillic and latin 
identifiers in the same file.

If we go from just identifiers a step further to C semantics, it could 
proably make sense to disallow the use of a latin and a cyrillic 
identifier, when the two identifiers look the same, and are in the same

scope.
MISRA C:2023 Dir 4.5 already is such a rule, it even applies within a 
single script (e.g. when there is an identifier AI, it is not allowed to

also have an identifier A1 or Al).

Philipp

P.S.: for now, if we use u8ident, I was considering doing something like

the code below for identifiers. This should give an error if the 
identifier is not compliant with the C standard, and a warning if it is 
"insecure".

// Give an error if the identifier does not comply with UAX #31, a 
warning if it does not comply with UTS #39.
#ifdef HAVE_U8IDENT_H
   u8ident_init (U8ID_PROFILE_DEFAULT, U8ID_NFC, options.c23 ? 
U8ID_TR31_C23 : U8ID_TR31_C11);
   if (options.std_c23)
     {
       char *normalized = u8ident_normalize (yylval.yychar, SDCC_NAME_MAX);
       strcpy (yylval.yychar, SDCC_NAME_MAX);
       yylval.yychar[SDCC_NAME_MAX - 1] = 0;
       free (normalized);
     }
   int errors_c = u8ident_check (yylval.yychar, NULL);
   if (errors_c < 0) // Invalid identifier.
     werror (E_INVALID_ID, yylval.yychar);
   else // Only check for subtle errors if there's no obvious ones.
     {
       if (errors_c | 1) // Only possible for C17 an earlier, from C23 
we normalize.
         werror (W_ID_NOT_NORMALIZED_NFC, yylval.yychar);
       u8ident_init (U8ID_PROFILE_DEFAULT, U8ID_NFC, U8ID_TR31_SAFEC26);
       int errors_safec = u8ident_check (yylval.yychar, NULL);
       const char *errormsg = "no issue";
       switch(errors_safec)
         {
         case 0:
           break ;
         case U8ID_EOK_WARN_CONFUS:
         case U8ID_ERR_CONFUS:
           errormsg = "confusion risk";
           break;
         case U8ID_ERR_XID:
           errormsg = "invalid xid";
           break;
         case U8ID_ERR_SCRIPT:
           errormsg = "invalid script";
           break;
         case U8ID_ERR_SCRIPTS:
           errormsg = "invalid combination of scripts";
           break;
         case U8ID_ERR_ENCODING:
           errormsg = "invalid encoding";
           break;
         case U8ID_ERR_COMBINE:
           errormsg = "invalid combination of codepoints";
           break;
         default:
           errormsg = "other issue";
         }
       if (errors_safec)
         werror (W_INSECURE_ID, yylval.yychar, "");
     }
#endif

_______________________________________________
sdcc-devel mailing list
sdc...@li...
https://lists.sourceforge.net/lists/listinfo/sdcc-devel

Re: [sdcc-devel] Using libunistring in SDCC?

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Re: [sdcc-devel] Using libunistring in SDCC?