|
From: Benedikt F. <b.f...@gm...> - 2025-10-22 18:17:16
|
Am 22.10.25 um 12:50 schrieb Philipp Klaus Krause: > Am 22.10.25 um 12:28 schrieb Benedikt Freisen via sdcc-devel: >> >> I am still trying to figure out in which ways exactly SDCC is >> currently not standard compliant. >> >> Is it just missing diagnostics or are there other issues, too? >> >> The monstrosity of a regular expression that parses utf8 identifiers >> in our lexer currently accepts exactly the byte sequences that are >> composed of byte sequences that correspond to valid utf8 code points >> within the character set that C11 allowed in identifiers. >> >> That means that no normalization is happening in the lexer and that >> it will still happily accept e.g. a "pile of poo" emoji thrown at it. >> Does the standard explicitly require diagnostics in these cases? > > 1) Missing diagnostics, and there is no permanent fix by just making a > few changes in is_UCN_valid_in_idf, since UAX #31 keeps changing > (latest update is 2025-08-20). Once could argue that C23 does not > explicitly require the diagnostic on the technicality that the "shall" > for this in §6.4.3.1 is outside a constraints section, thus we have > undefined behavior instead of a constraint violation, but IMO the > intent was to have a diagnostic, and C2y will likely move the wording > into a constraint section, thus clearly requiring a diagnostic. > > 2) Missing normalization. This means that some identifiers that should > be the same (and look exactly the same) will be handled as if they > were different. Formally, we could instead require identifiers in > source code to already be in normalization form C, but then we'd have > another missing diagnostic. > > Philipp In this case, I would like to suggest to specify that SDCC requires that all identifiers are provided in normalization form C. SDCC would then likely already supports all valid identifiers (and then some) and we can then use a 3rd party library to generate the diagnostic messages the standard demands. This library could then even be made optional. Benedikt |