Reproducing the error with docutils' rst2html.py:
$ cat test.rst
Asterisk emoji
==============
This is an *️⃣ emoji in a paragraph.
$ rst2html.py test.rst > /dev/null
test.rst:4: (WARNING/2) Inline emphasis start-string without end-string.
It seems it is interpreting the character as a separate asterisk plus something else.
It's not that the "character" is interpreted "as a separate asterisk plus something else". It IS a separate asterisk plus something else. That emoji is not a single character, it actually is composed of several characters, the first of which is a simple asterisk. According to https://emojipedia.org/keycap-asterisk/: "The Keycap Asterisk emoji is a keycap sequence combining * Asterisk and ⃣ Combining Enclosing Keycap. These display as a single emoji on supported platforms."
The asterisk is parsed as the start-string of inline emphasis, as reported in the warning. The same is true of SourceForge's markup interpretation: note how after the *️⃣ emoji here, the rest of the paragraph's text is italicized (while editing, anyhow).
As a workaround, use a backslash escape before the emoji to prevent parsing, like this:
Is addressing this issue worth adding a special case in the parser code? In my current opinion, no.
Last edit: David Goodger 2020-10-05
I agree that this exceptional case does not merit special handling in Docutils.
Let's wait a bit for a follow up from the reporter and otherwise close the ticket.
Didn't know you were waiting for a reply, sorry. ^^
My opinion: your projects, your rules. I do think this is a docutils issue, hence the report.
I know I can escape the asterisk, but if I have no control over the text I am parsing with docutils (i.e.: it is written by someone else), it means I have to manually scan the text for that special case and do the work of escaping. I think docutils could handle that, since it already has to parse the text.
Feel free to close this if you feel it does not help making docutils a better tool.
Ticket moved from /p/docutils/bugs/405/
Last edit: Günter Milde 2021-04-07
The use case(s) may be more generic, if we add a rule that a "inline markup start string" must not be followed by any "variation selector" or "combining character".
Solves some cases where an "inline markup start string" is actually part of the text.¹
makes the "inline markup recognition rules" even more complicated.
¹ Variation selectors and combining characters act on the preceding character --- they should never start an inline element's content .
Keeping this as a feature request.
Last edit: Günter Milde 2021-04-07