Thread: Re: [Docutils-users] Removing certain elements from reSTX markup

Brought to you by: goodger, grubert, milde, tibs, wiemann

docutils-users

Re: [Docutils-users] Removing certain elements from reSTX markup

From: Morten W. P. <mo...@ni...> - 2003-03-05 15:55:32

> Morten W. Petersen wrote:
>> I'm skimming through the "reStructuredText Markup Specification" and
>> I'm wondering how to remove certain elements from the markup, such as
>> inline literals.  Any ideas?
> 
> What do you mean?  Remove marked-up text from a document, or remove
> functionality from the parser?

Remove functionality from the parser, make the parser ignore certain
elements (without using the :: markup).

Regards,

Morten W. Petersen

Technologies: Zope, Linux, Python, HTML, CSS, PHP
    Homepage: http://www.nidelven-it.no
Phone number: (+47) 45 44 00 69

Re: [Docutils-users] Removing certain elements from reSTX markup

From: Morten W. P. <mo...@ni...> - 2003-03-05 16:31:06

> First, why?  What's your use case?  The simplest solution is, just 
> don't use that markup in your documents.  If it's important, make it
> a policy decision in your organization.

I'm trying to find a mix of simplified RST that can be used straight
away, and at most have 4-5 different markups (easily taught), for
example emphasis, bullet lists, blockquotes and simple tables.

At the same time, I want to remove the ability to mess things up
(``'' is turned into some sort of hyperlink) so that users can
do whatever they want when not using the agreed upon markups.

> Second, what should the parser do with the markup it ignores?

Don't touch it, 'pass it on' in other words.

> Third, which type of markup do you want to suppress, inline markup or
> block-level markup?

Both, I guess.

> The parser (implemented in module docutils.parsers.rst.states) deals 
> with the two separately and differently.  Block-level markup is
> recognized via an ordered dispatch table; remove the entry for the
> markup you don't want, and it won't be recognized.  Inline 
> markup (class Inliner) is recognized with a large regular expression
> (Inliner.patterns.initial) that's built from a data structure 
> (Inliner.parts), plus a table for standalone/implicit markup (URLs
> etc.; Inliner.implicit_dispatch).  Alter the data structure, rebuild
> and reinstall the regexp, and go from there.  Of course, you should
> work on subclasses or instances so as not to step on toes.

OK, I'll have a look at that.  Again, useful info.  :)

> The parser is not set up for this to be really easy to do, because it 
> hasn't been needed yet.

If there are more people out there like me, maybe it would be an idea
to refactor a bit to make parts of docutils more like a 'markup
parsing framework'?  Make it easier to mix-n-match different
markups, which could lead to diverse markup 'dialects' of STX;
diverse enough to be used by common people without screwing
up, and diverse enough for the syntatic programmer.

> > (without using the :: markup).
> 
> What does this mean?

Something like "making the parser ignore markup without literal
blocks".

Regards,

Morten W. Petersen

Technologies: Zope, Linux, Python, HTML, CSS, PHP
    Homepage: http://www.nidelven-it.no
Phone number: (+47) 45 44 00 69

Re: [Docutils-users] Removing certain elements from reSTX markup

From: David G. <go...@py...> - 2003-03-05 19:04:16

Morten W. Petersen wrote:
>> First, why?  What's your use case?  The simplest solution is, just
>> don't use that markup in your documents.  If it's important, make it
>> a policy decision in your organization.
> 
> I'm trying to find a mix of simplified RST that can be used straight
> away, and at most have 4-5 different markups (easily taught), for
> example emphasis, bullet lists, blockquotes and simple tables.

What do you do when the user needs one more construct, that isn't included
in your simplified set?  Personally, I'd rather begin educating with a small
number of core constructs, but using the full parser.  Simultaneously, give
references to the full docs for those who are interested in going further.

We don't teach programming languages or natural languages using limited
subsets.  We begin with simple concepts and build from there.

From PEP 287, Questions & Answers, #2:

   Is reStructuredText *too* rich?

   For specific applications or individuals, perhaps.  In general, no.

   Since the very beginning, whenever a docstring markup syntax has
   been proposed on the Doc-SIG_, someone has complained about the
   lack of support for some construct or other.  The reply was often
   something like, "These are docstrings we're talking about, and
   docstrings shouldn't have complex markup."  The problem is that a
   construct that seems superfluous to one person may be absolutely
   essential to another.

   reStructuredText takes the opposite approach: it provides a rich
   set of implicit markup constructs (plus a generic extension
   mechanism for explicit markup), allowing for all kinds of
   documents.  If the set of constructs is too rich for a particular
   application, the unused constructs can either be removed from the
   parser (via application-specific overrides) or simply omitted by
   convention.

I'd emphasize the final "or simply omitted by convention" as preferable.

> At the same time, I want to remove the ability to mess things up
> ... so that users can
> do whatever they want when not using the agreed upon markups.
>
>> Second, what should the parser do with the markup it ignores?
> 
> Don't touch it, 'pass it on' in other words.

Please consider carefully: would you really be doing your users a service
with this approach?

I think back to when I learned Japanese.  The class spent the first week
learning hiragana, the basic Japanese syllable characters (similar to an
alphabet), so we wouldn't get hooked on using "roma-ji" (roman letters, A-Z)
as a crutch.  When I lived in Japan, I found that people who had learned
Japanese with roma-ji reached a point -- learning written language -- beyond
which it was very difficult to progress, whereas those who learned with
hiragana had a much easier time.

Of course, reStructuredText is a much simpler language, but I believe the
same principles apply.

> (``'' is turned into some sort of hyperlink)

It's an error that's turned into a "problematic" element, with a link to the
diagnostic explanation.  It's an error because of unbalanced
double-backquotes.

>> The parser is not set up for this to be really easy to do, because it
>> hasn't been needed yet.
> 
> If there are more people out there like me, maybe it would be an idea
> to refactor a bit to make parts of docutils more like a 'markup
> parsing framework'?  Make it easier to mix-n-match different
> markups, which could lead to diverse markup 'dialects' of STX;
> diverse enough to be used by common people without screwing
> up, and diverse enough for the syntatic programmer.

I don't think encouraging dialects is a good idea.  It introduces
incompatibilities.  This use case doesn't sufficiently justify such changes
to me.  'Course, patches are always welcome.

-- David Goodger    http://starship.python.net/~goodger

Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv

Re: [Docutils-users] Removing certain elements from reSTX markup

From: Morten W. P. <mo...@ni...> - 2003-03-06 16:01:53

> What do you do when the user needs one more construct, that isn't 
> included in your simplified set?  Personally, I'd rather begin 
> educating with a small number of core constructs, but using the 
> full parser.  Simultaneously, give references to the full docs
> for those who are interested in going further.
> 
> We don't teach programming languages or natural languages using 
> limited subsets.  We begin with simple concepts and build from there.
> 
> >From PEP 287, Questions & Answers, #2:
> 
>    Is reStructuredText *too* rich?
> 
>    For specific applications or individuals, perhaps.  In general, no.

Exactly.  Althought I appreciate your advice, I'd like to try
out a basic markup and see if it's enough.  The idea isn't
to teach them a basic set and later teach them the whole thing;
the idea is to find a basic set that's easy to teach, and
powerful enough for most (90%+) of the things they need
to markup.

> I'd emphasize the final "or simply omitted by convention" as 
> preferable.

And what happens when the user does something that's an error
according to RST but not part of the markup he/she has learned?

> > Don't touch it, 'pass it on' in other words.
> 
> Please consider carefully: would you really be doing your users a 
> service with this approach?

I believe so, yes.  :)

> I think back to when I learned Japanese.
[...]
> Of course, reStructuredText is a much simpler language, but I believe 
> the same principles apply.

Yes, the principles might apply if the intention is to learn them 
everything, starting with a basic markup.  It isn't the intention.

>> If there are more people out there like me, maybe it would be an idea
>> to refactor a bit to make parts of docutils more like a 'markup
>> parsing framework'?  Make it easier to mix-n-match different
>> markups, which could lead to diverse markup 'dialects' of STX;
>> diverse enough to be used by common people without screwing
>> up, and diverse enough for the syntatic programmer.
> 
> I don't think encouraging dialects is a good idea.  It introduces
> incompatibilities.  This use case doesn't sufficiently justify such 
> changes to me.  'Course, patches are always welcome.

I'd like to play around with it;  if there's time, I will.  :)

Regards,

Morten W. Petersen

Technologies: Zope, Linux, Python, HTML, CSS, PHP
    Homepage: http://www.nidelven-it.no
Phone number: (+47) 45 44 00 69

Re: [Docutils-users] Removing certain elements from reSTX markup

From: David G. <go...@py...> - 2003-03-05 16:16:22

Morten W. Petersen wrote:
>> What do you mean?  Remove marked-up text from a document, or remove
>> functionality from the parser?
> 
> Remove functionality from the parser, make the parser ignore certain
> elements

First, why?  What's your use case?  The simplest solution is, just don't use
that markup in your documents.  If it's important, make it a policy decision
in your organization.

Second, what should the parser do with the markup it ignores?

Third, which type of markup do you want to suppress, inline markup or
block-level markup?

The parser (implemented in module docutils.parsers.rst.states) deals with
the two separately and differently.  Block-level markup is recognized via an
ordered dispatch table; remove the entry for the markup you don't want, and
it won't be recognized.  Inline markup (class Inliner) is recognized with a
large regular expression (Inliner.patterns.initial) that's built from a data
structure (Inliner.parts), plus a table for standalone/implicit markup (URLs
etc.; Inliner.implicit_dispatch).  Alter the data structure, rebuild and
reinstall the regexp, and go from there.  Of course, you should work on
subclasses or instances so as not to step on toes.

The parser is not set up for this to be really easy to do, because it hasn't
been needed yet.

> (without using the :: markup).

What does this mean?

-- David Goodger    http://starship.python.net/~goodger

Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv