#20 Custom Inline Markup


I know this is going to get boos and jeers, but what I would like to propose we make the parser more flexible and allow for arbitrary inline markup beyond (as in in addition to but never excluding) the current inline markup: *emphasis*, **strong**, `interpreted`, ``literal``, _linktarget, linkref_, [footnote]_, etc.

Currently, the rst language allows those markups plus customization through a role assigned to interpreted text, e.g. :myrole:`text` or `text`:myrole:, where myrole is a registered role. Again, none of the current roles should be changed in this proposal: subscript will still be used as `2`:sub:, and so on. The only way to add custom inline markup would be to create a custom role for it. Let's say for instance I want to have /italics/ be a new form of inline markup I want to use for a given document (this would ONLY apply to the document containing this custom role definition; documents already containing, say, a directory listing which uses /.../ as literal text would not be affected because it would not include the customization role). I propose we augment the custom role directive simply by added a new optional parameter, :delimiters:, so that in the example I just gave:

.. role:: InlineItalics(emphasis)
:class: italic
:delimiters: / /

.. role:: InlineSubscript(subscript)
:delimiters: _ _

This text is /italic/, but is this proposal under H\ _2_\ O?

The format of :delimiters: is simple: <start-string> <space> <end-string>. The rules for the custom delimiters would be the same as for any inline markup in terms of spaces before or after beginning or ending quotes and, very importantly, no current inline markup can be overridden by this technique; attempts to do so will generate an error:

.. role:: InlineRed
:class: red
:delimiter: * *

<Error: Cannot override built-in delimiters -- if you wish to change delimiter behavior use style sheets or other writer customizations>

As you can see from the example above, I've use one pattern, namely the _ _ markup to indicate subscript; while _ [nothing] is used for a navigation target and [nothing] _ is used for a link, the customization is of the form _ _, which does not exactly match either of these and should be safe.

I know users of this technique will potentially pollute the rst language with lots of complicated custom markup, so if we add this, it would be useful to create an include file with some pre-emptive suggestions to keep things relatively standard. Granted, we can still use the long form for things:

This text is :italic:`italic`, but is this proposal under H\ :sub:`2`\ O?

But isn't this easier to read?

This text is /italic/, but is this proposal under H\ _2_\ O?

And isn't that what rst is all about: Making markup transparent?

Now, I will look into this change but FIRST I want to get that Dave Abrahams patch from 2004 all spiffied up and ready for review in 2010 so that we can FINALLY nest the inline markup, and I have another change to the way roles work that I would like to implement first, namely if a custom role definition is used in a rst document before it is defined, DON'T immediately generate an error: create a pending node with the relevant text information and then in the Transform pass try again to resolve the custom role and if it fails THEN, generate the error. Thus, custom role definitions may appear at the end of the file instead of having to go at the beginning. I mention this because if you use delimiters, that is NOT possible. The only way custom delimiters are going to be parsed is if they are defined before they are seen, which would unfortunately mean my role resolution patch may be less useful than I'd hoped, but I figure if someone 6 years from now wants to make custom delimiters be parsed in a lazy fashion like I propose for traditional interpreted roles, let them be my guest!


  • David Goodger

    David Goodger - 2010-03-14

    It's not a matter of "boos and jeers", it's a matter of difficulty. You're talking about some major changes to the internals of Docutils. If you have time to work on this, please do so, and then we'll talk. I don't know of anyone with the time for this, including me.

  • Jeffrey C. Jacobs

    Fair enough; I'd like to take a crack at this implementation anyway so if you don't mind, may we please leave this issue opened as a placeholder for me to post updates and patches as I have time for them. Overall, looking at how the parser now works, it seems first best to figure whether nesting can be implemented and how because how this would be implemented would depend on that. Also, I came up with a potential use case that would not work because I suggested the built-ins are higher priority than any custom markup:

    .. role:: parenthetical
    :class: parenthetical
    :delimiter: `( )`

    The problem here is clearly the same problem the code solves with `interpreted` and ``literal`` text and with *emphasis* and **strong** emphasis so although the regexp could easily be rebuilt with the custom markup, the problem of what's matched first becomes an issue if ` is before `(. That's an implementation detail that I'll keep in mind but first, as I said, I think it time we put nesting to bed and that's my primary focus, then perhaps deferred roles, perhaps 'smart' node merges (I have an idea for an alternate proposal to the one I submitted in the other bug which may simplify things) and then maybe the :valueasclass: option as easier and finally this one as the flower in the cap, as it were.

    Anyway, thanks for the input, David; my main goal will be to build documented, well-written code with adequate test cases for coverage. If I can do that, I only hope the proposal will be judged fairly and the work will not have amounted to naught,

    Personally, though, I have some ideas about docx and rtf writers in my back pocket that I may return to in the coming years, but all that in due time.


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks