|
From: Janko S. <jan...@ec...> - 2025-11-05 13:59:53
|
Dear Fabrice, Just using an opportunity to write that I'm a huge fan of all your open source work, just the most memorable to me TCC, FFmpeg, QEMU and JSLinux (in that approximate order). Regarding the topic currently discussed and the related messages: my impression is that since a while some are more a material generated by some AI engine than of any specific human. And my understanding is that the previous arguments of Philipp for using some Unicode library in SDCC were (in themessages in this mailing list, as per date time stamps): 2025-10-22 09:08:57 "Currently, the C23 standard technically only requires us to check for valid unicode, since valid unicode that is not allowed in identifiers results in undefined behavior, but IMO, that was a mistake (a "shall" outside of a constraint section results in UB, while in a constraint section it requires a diagnostic), and will likely be fixed, with a recommendation to also apply the fix to implementations of earlier standards." 2025-10-22 10:50:26 "Once could argue that C23 does not explicitly require the diagnostic on the technicality that the "shall" for this in §6.4.3.1 is outside a constraints section, thus we have undefined behavior instead of a constraint violation, but IMO the intent was to have a diagnostic, and C2y will likely move the wording into a constraint section, thus clearly requiring a diagnostic." A fan of your work, Janko > > Von: 周 子益 <z12...@ou...> > Datum: 05.11.2025 03:28:21 > An: "sdc...@li..." <sdc...@li...> > Betreff: [sdcc-devel] 转发: Inquiry Regarding Unicode Homoglyph and Normalization Handling in SDCC > > 发件人: Fabrice Bellard > 发送时间: 2025年11月3日 23:47 > 收件人: 周 子益 > 主题: Re: Inquiry Regarding Unicode Homoglyph and Normalization Handling in SDCC > > Dear Zhou, > > On 10/31/25 7:38 AM, 周 子益 wrote: > > I hope this message finds you well. > > I am writing to seek your insights regarding a topic currently under > > discussion in the SDCC community: whether to incorporate support for > > Unicode homoglyph detection and normalization (such as NFC) directly > > into the compiler. As you may know, this issue has sparked debate around > > balancing usability, maintenance overhead, and the core responsibilities > > of a compiler. > > Some key points of the discussion include: > > > > 1. > > The potential maintenance burden of tracking evolving Unicode > > homoglyph standards. > > 2. > > Whether homoglyph checks and normalization are better suited for > > external tools (e.g., linters) rather than being embedded in the > > compiler. > > 3. > > Concerns about enforcing specific normalization forms and how it > > might impact flexibility, particularly for edge cases like > > obfuscated code or embedded data. > > > > Given your extensive experience with compiler design and projects like > > TinyCC (TCC), I would greatly appreciate your perspective on the following: > > > > * > > Do you believe Unicode homoglyph detection and normalization belong > > in a compiler like SDCC, or should these be delegated to external tools? > > * > > How might such features impact compiler performance, > > maintainability, and user flexibility? > > * > > Are there alternative approaches or existing libraries you would > > recommend for handling these challenges? > > I am not sure I understand the problem. SDCC should handle only ASCII > identifiers. Handling unicode identifiers is useless and a source of > endless problems. Hence there is no point in doing homoglyph detection > and normalization in the compiler. > > Best regards, > > Fabrice. > > |