To everyone who commented on this, Thank you for your input. You have been heard; no need to keep posting about this. Please be patient. There will be follow-ups on this (and other things) before too long. David Goodger
It's not that the "character" is interpreted "as a separate asterisk plus something else". It IS a separate asterisk plus something else. That emoji is not a single character, it actually is composed of several characters, the first of which is a simple asterisk. According to https://emojipedia.org/keycap-asterisk/: "The Keycap Asterisk emoji is a keycap sequence combining * Asterisk and ⃣ Combining Enclosing Keycap. These display as a single emoji on supported platforms." The asterisk is parsed...
It's not that the "character" is interpreted "as a separate asterisk plus something else". It IS a separate asterisk plus something else. That emoji is not a single character, it actually is composed of several characters, the first of which is a simple asterisk. According to https://emojipedia.org/keycap-asterisk/: "The Keycap Asterisk emoji is a keycap sequence combining * Asterisk and ⃣ Combining Enclosing Keycap. These display as a single emoji on supported platforms." The asterisk is parsed...
This behavior is as intended, not a bug. It was intended to help thwart email harvesters, separately from email cloaking. I don't if it's useful for that or not. The behavior could be removed for the "@" character specifically if it's really causing an issue (but see below). The code for it is here: docutils.writers._html_base.HTMLTranslator.encode, using the HTMLTranslator.special_characters lookup table. Try your generated HTML and see: it should work fine. At least, it works fine with vanilla...
Email address cloaking is done for non-mailto links, even if feature is disabled
This behavior is as intended, not a bug. It was intended to help thwart email harvesters, separately from email cloaking. I don't if it's useful for that or not. The behavior could be removed for the "@" character specifically if it's really causing an issue (but see below). The code for it is here: docutils.writers._html_base.HTMLTranslator.encode, using the HTMLTranslator.special_characters lookup table. Try your generated HTML and see: it should work fine. At least, it works fine with vanilla...
Can you provide examples of such use cases in the wild?
The "check_content" methods look good. The __call__ method has a small problem: no need to check the cache here (smells like a premature optimization), as it's done in "import_from_packages". If there's no cache hit, it will be checked twice. Perhaps move the docstrings from after the get_language = ... assignments into the LanguageImporter & RstLanguageImporter classes? Then the docstrings may be useful in running code. Otherwise, looks good, thanks!