Menu

#79 Inline markup start string must not be followed by variation selector or combining character.

Default
open
nobody
None
5
2021-04-07
2020-10-04
Peque
No

Reproducing the error with docutils' rst2html.py:

$ cat test.rst
Asterisk emoji
==============

This is an *️⃣ emoji in a paragraph.
$ rst2html.py test.rst > /dev/null
test.rst:4: (WARNING/2) Inline emphasis start-string without end-string.

It seems it is interpreting the character as a separate asterisk plus something else.

Discussion

  • David Goodger

    David Goodger - 2020-10-05

    It's not that the "character" is interpreted "as a separate asterisk plus something else". It IS a separate asterisk plus something else. That emoji is not a single character, it actually is composed of several characters, the first of which is a simple asterisk. According to https://emojipedia.org/keycap-asterisk/: "The Keycap Asterisk emoji is a keycap sequence combining * Asterisk and ⃣ Combining Enclosing Keycap. These display as a single emoji on supported platforms."

    The asterisk is parsed as the start-string of inline emphasis, as reported in the warning. The same is true of SourceForge's markup interpretation: note how after the *️⃣ emoji here, the rest of the paragraph's text is italicized (while editing, anyhow).

    As a workaround, use a backslash escape before the emoji to prevent parsing, like this:

    This is an \*️⃣ emoji in a paragraph.
    

    Is addressing this issue worth adding a special case in the parser code? In my current opinion, no.

     

    Last edit: David Goodger 2020-10-05
  • Günter Milde

    Günter Milde - 2020-10-28
    • status: open --> pending-remind
     
  • Günter Milde

    Günter Milde - 2020-10-28

    I agree that this exceptional case does not merit special handling in Docutils.
    Let's wait a bit for a follow up from the reporter and otherwise close the ticket.

     
  • Peque

    Peque - 2020-10-28

    Didn't know you were waiting for a reply, sorry. ^^

    My opinion: your projects, your rules. I do think this is a docutils issue, hence the report.

    I know I can escape the asterisk, but if I have no control over the text I am parsing with docutils (i.e.: it is written by someone else), it means I have to manually scan the text for that special case and do the work of escaping. I think docutils could handle that, since it already has to parse the text.

    Feel free to close this if you feel it does not help making docutils a better tool.

     
  • Günter Milde

    Günter Milde - 2021-03-29

    Ticket moved from /p/docutils/bugs/405/

     

    Last edit: Günter Milde 2021-04-07
  • Günter Milde

    Günter Milde - 2021-03-29

    The use case(s) may be more generic, if we add a rule that a "inline markup start string" must not be followed by any "variation selector" or "combining character".

    • Solves some cases where an "inline markup start string" is actually part of the text.¹

    • makes the "inline markup recognition rules" even more complicated.

    ¹ Variation selectors and combining characters act on the preceding character --- they should never start an inline element's content .

    Keeping this as a feature request.

     

    Last edit: Günter Milde 2021-04-07
  • Günter Milde

    Günter Milde - 2021-04-07
    • summary: Cannot use asterisk emoji without a warning --> Inline markup start string must not be followed by variation selector or combining character.
     

Log in to post a comment.