|
From: Janko S. <jan...@ec...> - 2025-10-24 11:03:28
|
That's true. As even the normalized form doesn't tell about the "likeness appearance", there would have to be some function transforming the normalized forms to the "likeness representation" which could be then compared reasonably efficiently. Which is also something that should be implemented in some specialized libraries. I still hope these kind of source-analysis features would remain optional, however. Regarding SDCC current hashing, it can be the "good enough" solution even if the asymptotic behavior would degrade, if the degradation never becomes noticeable in practice. And if the fixed hash table size could result in observable slowdown in practice, that could also be improved. --- Ursprüngliche Nachricht --- Von: Philipp Klaus Krause <pk...@sp...> Datum: 23.10.2025 18:12:39 An: sdc...@li... Betreff: Re: [sdcc-devel] Using libunistring in SDCC? Am 23.10.25 um 17:54 schrieb "Janko Stamenović" via sdcc-devel: > Hopefully such checks are done by external tools and not by a > compiler, as I can imagine a combinatorial explosion of testing > what is similar to what, especially in projects with huge headers? I don't think there'd be an explosion. Any identifier encountered needs to be tested against all previous ones anyway (to check if they are the same). Unless hat is done via a hash map, binary tree or such, this is already a number of string comparions quadratic in the number of strings (SDCC AFAIK does currently use a hash map, but into an array of fixed size, so we still have asymptotically quadratic effort). The change would be replacing the strcmp() comparison with a more complicated one considering homoglyphs, but in the end, most pairs of identifiers would still differ early. So still negligible effort compared to other compiler stages. Philipp _______________________________________________ sdcc-devel mailing list sdc...@li... https://lists.sourceforge.net/lists/listinfo/sdcc-devel |