[Docutils-develop] Re: nested inline markup

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Mark Nodine <no...@so...> writes:

> David Abrahams wrote:
>> 
>> David Goodger <go...@py...> writes:
>> 
>> > David Abrahams wrote:
>> >  > Here's another question: the way I've coded it, once problematic
>> >  > text is found, I stop trying to recursively find nested markup.
>> >  > Would you like it to warn about the problem as it does now, and then
>> >  > just continue to parse it for inline markup?
>> >
>> > Can you show examples of what your code does now?  Here is some input:
>> >
>> >      *emph **strong *prob ``literal``, end of strong**, end of emph*
>> >
>> > Ideally, I'd like this to parse to:
>> >
>> >      <paragraph>
>> >          <emphasis>
>> >              emph
>> >              <strong>
>> >                  strong
>> >                  <problematic ...>
>> >                      *
>> >                  prob
>> >                  <literal>
>> >                      literal
>> >                  , end of strong
>> >              , end of emph
>>
>> It can't. 
>
> It can.  The Perl parser gets what David G. wanted.
>
>> I guess that's the
>> interpretation which results the fewest errors, but I think we could
>> probably construct cases where the other interpretation would be more
>> sensible:
>> 
>>       *emph *prob **strong ``literal``, end of strong**, end of emph*
>
> However, this one did trip up my Perl parser :-(.  I'll have to see
> what's going on.

It's not clear there's a "right answer".  The algorithm I outlined in
private mail to you and David gets:

   (*)emph ((*)prob ((**)strong ((``)literal(``)), end of strong(**)), end of emph(*))
   ^^^     
   ^^^-------unmatched        

If you want:

   ((*)emph (*)prob ((**)strong ((``)literal(``)), end of strong(**)), end of emph(*))
            ^^^     
unmatched---^^^

You need a non-deterministic parser and a rule which values outer
matching earlier start strings more than later ones (or you have to
parse it backwards ;->).  Non-deterministic parsers are possible (I've
built them), but to do that would be more Perlish than Pythonic,
IMO.  It's just a case of giving in to the temptation to guess.

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com