Re: [sdcc-devel] 转发: Inquiry Regarding Unicode Homoglyph and Normalization Handling in SDCC

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Dear Fabrice,

Just using an opportunity to write that I'm a huge fan of all
your open source work, just the most memorable to me TCC, FFmpeg,
QEMU and JSLinux (in that approximate order). 

Regarding the topic currently discussed and the related messages:
my impression is that since a while some are more a material
generated by some AI engine than of any specific human.

And my understanding is that the previous arguments of Philipp for
using some Unicode library in SDCC were (in themessages in this
mailing list, as per date time stamps): 

2025-10-22 09:08:57

"Currently, the C23 standard technically only requires us to check
for valid unicode, since valid unicode that is not allowed in
identifiers results in undefined behavior, but IMO, that was a
mistake (a "shall" outside of a constraint section results in UB,
while in a constraint section it requires a diagnostic), and will
likely be fixed, with a recommendation to also apply the fix to
implementations of earlier standards." 

2025-10-22 10:50:26

"Once could argue that C23 does not explicitly require the
diagnostic on the technicality that the "shall" for this in
§6.4.3.1 is outside a constraints section, thus we have undefined
behavior instead of a constraint violation, but IMO the intent was
to have a diagnostic, and C2y will likely move the wording into a
constraint section, thus clearly requiring a diagnostic." 

A fan of your work,
Janko

>  
> Von: 周 子益 <z12...@ou...>
> Datum: 05.11.2025 03:28:21
> An: "sdc...@li..." <sdc...@li...>
> Betreff: [sdcc-devel] 转发: Inquiry Regarding Unicode Homoglyph and Normalization Handling in SDCC 
>  
> 发件人: Fabrice Bellard
> 发送时间: 2025年11月3日 23:47
> 收件人: 周 子益
> 主题: Re: Inquiry Regarding Unicode Homoglyph and Normalization Handling in SDCC  
>  
> Dear Zhou,
> 
> On 10/31/25 7:38 AM, 周 子益 wrote:
> > I hope this message finds you well.
> > I am writing to seek your insights regarding a topic currently under
> > discussion in the SDCC community: whether to incorporate support for
> > Unicode homoglyph detection and normalization (such as NFC) directly
> > into the compiler. As you may know, this issue has sparked debate around
> > balancing usability, maintenance overhead, and the core responsibilities
> > of a compiler.
> > Some key points of the discussion include:
> >
> >  1.
> >     The potential maintenance burden of tracking evolving Unicode
> >     homoglyph standards.
> >  2.
> >     Whether homoglyph checks and normalization are better suited for
> >     external tools (e.g., linters) rather than being embedded in the
> >     compiler.
> >  3.
> >     Concerns about enforcing specific normalization forms and how it
> >     might impact flexibility, particularly for edge cases like
> >     obfuscated code or embedded data.
> >
> > Given your extensive experience with compiler design and projects like
> > TinyCC (TCC), I would greatly appreciate your perspective on the following:
> >
> >   *
> >     Do you believe Unicode homoglyph detection and normalization belong
> >     in a compiler like SDCC, or should these be delegated to external tools?
> >   *
> >     How might such features impact compiler performance,
> >     maintainability, and user flexibility?
> >   *
> >     Are there alternative approaches or existing libraries you would
> >     recommend for handling these challenges?
> 
> I am not sure I understand the problem. SDCC should handle only ASCII
> identifiers. Handling unicode identifiers is useless and a source of
> endless problems. Hence there is no point in doing homoglyph detection
> and normalization in the compiler.
> 
> Best regards,
> 
> Fabrice.
> 
>

Re: [sdcc-devel] 转发: Inquiry Regarding Unicode Homoglyph and Normalization Handling in SDCC

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Re: [sdcc-devel] 转发: Inquiry Regarding Unicode Homoglyph and Normalization Handling in SDCC