Thread: [Docutils-develop] nested inline markup

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

[Beni Cherniavsky]
 > You also need to update the spec, DTD and writers of course...

The DTD already supports nested markup.  The HTML writer shouldn't
have any trouble with it.  I don't know about LaTeX.  Updating the
spec is a small job.

[David Abrahams]
 > I'm hoping to get some help from the community on those chores once
 > I have done the "hard part". I've certainly seen changes go into
 > docutils core without all the writers being updated by the same
 > person.

Yes, don't wait for other parts to become compatible.  That'll happen
in time.

[David Abrahams]
 > OK, I have something implemented which seems to handle most of the
 > logic (it's not really doing tokenization, but something that should
 > be semantically equivalent, and doesn't result in a complete
 > rewrite)

Cool.  Can't wait to see the code.  As this would be a major change to
the parser, please provide a patch rather than checking it in
directly.  Thanks.

 > but there are a few cases I need to get some feedback on.
 >
 > 1. *****
 >
 >   gets "tokenized", naturally, as
 >
 >
 >       <**><*><**>
 >
 >   And when the inliner re-parses <*>, it complains about an inline
 >   start-string without a corresponding end-string.  So, the question
 >   is, do we:
 >
 >     a. turn off complaints inside inline markup about unmatched
 >        inline markup start-strings without corresponding end strings
 >
 >     b. Make that an error and force the user to write
 >
 >          **\***

I think (b) is the way to go here.

 > 2. ``literal ``TeX quotes'' & \\backslash``
 >
 >   This one is currently parsed as though tokenized this way:
 >
 >      <``><literal ``TeX quotes'' & \\backslash><``>
 >
 >   But my code tokenizes inside markup and so sees:
 >
 >      <``><literal ><``>TeX quotes'' & \\backslash><``>
 >
 >   Again, I see two choices:
 >
 >      a. Turn off recognition of an inline markup start string within
 >         regions already using that markup.
 >
 >      b. Make that an error and force the user to write
 >
 >           ``literal \``TeX quotes'' & \\backslash``
 >
 > In both cases (a) is more backward-compatible but (b) is more
 > consistent, and, dare I say, Pythonic.

Here I disagree.  The spec says:

     No markup interpretation (including backslash-escape
     interpretation) is done within inline literals.

The current parsing is correct and must remain.  The double-backquotes
before "TeX" must not be interpreted as markup.  Only an inline
literal end-string should be searched for upon encountering an inline
literal start-string.

So (b) is no good because backslashes must be left alone; they must
appear in the output.  (a) is too general; for instance, nested inline
markup has to allow interpreted text within interpreted text::

     :role1:`interpreted :role2:`text``

 > in-the-face-of-ambiguity-refuse-the-temptation-to-guess-ly y'rs,

Good policy ;-)

-- 
David Goodger                               http://python.net/~goodger
For hire: http://python.net/~goodger/cv

Thread: [Docutils-develop] nested inline markup

docutils-develop