Docutils: Documentation Utilities / Feature Requests / #88 Unify Docutils CLI tools into `docutils-cli`

engelbert gruber - 2022-01-16

working on the commandline means tipping less by using completion

if i type rstTABTAB the list of all writers shows up
if there is only docutils-cli i have to read the documentation.
if there happens to be a new writer with rst.... i will be notified by the completion result,
if there is only docutils-cli i have to read the documentation.

of course new readers wont show up for rstTABTAB

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Adam Turner - 2022-01-16
  
  of course new readers wont show up for rstTABTAB
  
  I think this partly speaks to the issue -- the tab completion functionality only works "by accident", and doesn't support readers/parsers.
  
  It seems it might be possible to add custom bash autocomplete rules (https://caliban.org/bash/#completion) -- would this be an acceptable workaround?
  
  A
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Günter Milde - 2022-01-17

While I am in favour of revising and updating Docutils' command line
entry points, I don't think we should drop the number down to one.

I'll briefly re-outline my argument to (eventually) drop the rst*
front-end tools, and only export docutils-cli (or python -m docutils).

I think a single front-end tool significantly simplifies a lot of
things

Can you elaborate a bit on what would become a lot simpler?

the docutils-cli wrapper is not complex, which gives it
significant points in favour in my book.

The generic front-end is one order of magnitude more complex because of
the two-stage command line parsing with the set of valid tags depending
on the components selected.
Even help output depends on the "component" tags. Due to the open nature
(allowing for plug-in components), a man page will always need to refer
to external documentation, while, e.g, man rst2html lists all available
command line options (at least on Debian).

Most usage of Docutils today is programmatic, and not via the command
line tools

We need to care for "command line users" if their number is
non-negligible -- independent of the number of users depending on the
programmatic interface.

The number of users/projects using Docutils via the command line interface
cannot be estimated by looking at Python projects.
Unfortunately, it is rather hard to find out how many non-Python projects
uses "rst2html.py" in their Makefile or another form of build tool chain.

The first answer to Explain Python entry points?
even cites Doctils as

... a great example of entry-point use: it will install something like a
half-dozen useful commands for converting Python documentation to other
formats.

(even if Docutils currently does not use the "console-scripts mechanism" to
provide cli entry points).

...

We cannot know how many people would be affected with local random
scripts, but it is a two-second change.

While the actual re-typing (or drag-and-drop) of the command may be that
fast, this is not the case for the complete task of finding out and
approaching the right spot where to apply the change in a complex build
chain.

Many users will also run with old or pinned versions of Docutils, and
part of updating is seeing the changelog.

A hard learned lesson from Docutils releases is to never underestimate
the number of users/project managers that don't read the changelog (nor
the announcements in the RELEASE-NOTES) yet depend on a stable Docutils
for a stable system.

I am pro change for instances where the current
naming is unfortunate or may stand in the way.

buildhtml.py is too generic, it may stand in the way.
Debian calls it rst-buildhtml. I could imagine docutils-buildhtml or
leaving it in the tools for individual installing.

docutils-cli.py is too long. This name was selected because a naming
the file for the generic front end tool "docutils.py" is misleading.
With "entry points" it is possible to use docutils as front-end command
without the need for a file "docutils.py".

python3 -m docutils currently results in the error:
'docutils' is a package and cannot be directly executed
It could be made more helpful, we know, a user typing python -m ... wants
to execute a command line tool (or just wants to know more about docutils).

rst2 is established as the start of Docutils' front-end names for
conversion from reStructuredText to something. I would like to keep this
prefix as "ours". (After all, Docutils is the reference implementation of the
rST format.)

Ease of discovery is important. TAB completion is a powerfull means here.
Additional parser or readers may add their own entry points, cf.
https://github.com/executablebooks/MyST-Parser/issues/347#issuecomment-1003717830

Rarely used and diagnostic tools may not need automatic installation into
the binary PATH. Here, it may help to diagnose which tools are installed by
pip docutils vs. OS-specific package managers.

Debian installs the following 13:

rst2html
rst2html4
rst2html5
rst2latex
rst2man
rst2odt
rst2odt_prepstyles
rst2pseudoxml
rst2s5
rst2xetex
rst2xml
rst-buildhtml
rstpep2html

Dropping the .py from rst2*.py commands may be considered.

+1 shorter and more command-like names
-1 backwards incompatible, an unknown number of users need to change their scripts.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Adam Turner - 2022-01-20
  
  Can you elaborate a bit on what would become a lot simpler?
  
  Currently deep in Docutils' internals (everywhere that takes a settings_spec or uses self.settings sort of assumes working as a command line programme. However, a lot of usage (programmatic, through Sphinx or other methods) entirely use the default values for things. By moving to a single front end I would argue it is not only a cleaner user story, but it might enable refactoring to move the CLI usages of Docutils to a higher level.
  
  Currently we need to do awful things to subclass and patch either optparse.OptionParser or argparse.ArgumentParser. This is really unusual, and for developers coming from a more "normal" command line application, it can take a while to understand this part of the internals of Docutils.
  
  I didn't go into detail intentionally so as not to spark a debate about these parts, but I do think (eventually) simplifying these interactions can lead to a cleaner codebase.
  
  the two-stage command line parsing
  
  I don't think you can get away from this though without a combinatorial explosion of readers, writers, and parsers. Say we have two useful CLI readers (standalone/pep), three parsers (rst/recommonmark/myst), and 6 useful writers (html5/html4/latex/xetex/man/xml) that is 36 distinct front-end tools we should be providing.
  
  a man page will always need to refer to external documentation
  
  I will admit ignorance on how man pages work. docutils-cli --writer xetex --help, though, will always give the correct help output. This is also the version we should be promoting, not least as it works cross platform (if my patch with entrypoints is merged!).
  
  We need to care for "command line users" if their number is non-negligible
  
  Of course -- sorry if my post came across as callous in any way towards frontend tool users. I suppose what I don't want is to be in a situation where we are not making real improvements based on hypothetical situations. It might be useful to find ways of proxying for CLI usage -- bugs filed recently with us/redistributors, usages in public archives ( https://grep.app or similar ), etc.
  
  rather hard to find out how many non-Python projects uses "rst2html.py" in their Makefile or another form of build tool chain
  
  True. However by the above methods we can get an estimate, surely? There are a lot of people who commit random things to GitHub / GitLab / whatever!
  
  finding out and approaching the right spot where to apply the change
  
  This is why I proposed to go about it by emitting warnings during deprecation, before total removal. We also need to consider the support that this project offers -- if a downstream user has integrated Docutils into a complex tool chain and cannot maintain it, we shouldn't be responsible for that.
  
  never underestimate the number of users/project managers that don't read the changelog ... yet depend on a stable Docutils for a stable system.
  
  Fair enough -- though perhaps another route we could go down in the deprecation notices are to say "pin version XX". There is no best solution here -- all change will break someone's workflow (XKCD 1172!), but we should be working to make the upgrade path as easy as possible.
  
  buildhtml
  
  Ahh, I was under the impression that buildhtml was an internal tool for building the website. Would it be reasonable to formally retire it from public use, and reccommend Sphinx as an alternative?
  
  use docutils as front-end command // python -m docutils
  
  +1
  
  Ease of discovery is important. TAB completion is a powerfull means here
  
  Did you see my suggestion on using custom shell autocompletion functions? I believe that this would allow for tab completion with the reader/parser/writer flags.
  
  rst2 is established as the start of Docutils' front-end names for conversion from reStructuredText to something. I would like to keep this prefix as "ours".
  
  If we use what I propsed in one of my changesets to reimplement the rst2 commands in terms of docutils-cli, it would be entirely possible to deprecate the rst2 commands but just keep them forever. This would also mean that the simplifications I proposed at the top of this message wouldn't be blocked (I think).
  
  Concrete proposal:
  
  Promote docutils or python -m docutils where we currently reference rst2
  
  Reimplement the rst2* commands in terms of docutils-cli
  
  Try to implement <TAB> autocompletion for docutils-cli
  
  Use entrypoints for everything (but also keep .py aliases for a while)
  
  Deprecate rst2* commands, but with no removal date
  
  A
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Günter Milde - 2022-01-22
    
    Currently deep in Docutils' internals (everywhere that takes a
    settings_spec or uses self.settings sort of assumes working as a
    command line programme. However, a lot of usage (programmatic, through
    Sphinx or other methods) entirely use the default values for things.
    
    Even with Sphinx, some features can only be customised from a
    docutils.conf configuration file.
    
    The sttings_spec and document.settings are Docutils abstraction from
    the different configuration ways (config-files/command line/programmatic).
    Using document.settings should be possible without too much thinking
    about the actual source of the setting value.
    
    An overview for programmatic use of the "settings" framework is given in
    https://docutils.sourceforge.io/docs/api/runtime-settings.html#runtime-settings-processing-from-applications
    (best read alongside pydoc3 -b output for the mentioned functions/classes).
    
    By moving to a single front end I would argue it is not only a cleaner
    user story, but it might enable refactoring to move the CLI usages of
    Docutils to a higher level.
    
    We already have the docutils.core.publish_* "convenience functions" as
    a high-level API for custom front-ends (both command-line and programmatic).
    
    "docutils-cli" is more complex because here we want the components to be
    configurable from the command line or config file. I am working on moving
    the complexity to a library function that can be re-used by other
    "script" entry-points in need of configurable components. This will become
    an extension or addition to docutils.core.publish_cmdline().
    (It may also become simpler once "optparse" is replaced with "argparse".)
    
    Currently we need to do awful things to subclass and patch either
    optparse.OptionParser or argparse.ArgumentParser. This is really
    unusual, and for developers coming from a more "normal" command line
    application, it can take a while to understand this part of the
    internals of Docutils.
    
    Yes, indeed. Docutils has an elaborated configuration framework which
    actually predates the "optparse" module. Later development of "Optik"
    into "optparse" and then "argparse" implemented some of the abstractions and
    enhancements offered by Docutils in a different way.
    
    But (in contrast to developers working on the optparse->argparse
    transition :) "normal" developers using the "docutils" package don't need
    to care about the details here. They can use the high-level API offered
    by SettingsSpec / settings and get the command line and config file
    processing for free (docutils.frontend is only the "workhorse",
    docutils.core is the high-level interface).
    I agree that there is room for improvement in this API, but I don't think
    getting rid of the simple front-ends in favour of one complex front end
    will be of much help in this quest.
    
    I didn't go into detail intentionally so as not to spark a debate about
    these parts, but I do think (eventually) simplifying these interactions
    can lead to a cleaner codebase.
    
    I suggest moving this thread of the discussion over to [bugs:#441].
    
    the two-stage command line parsing
    
    I don't think you can get away from this though without a combinatorial
    explosion of readers, writers, and parsers.
    Say we have two useful CLI readers (standalone/pep), three parsers
    (rst/recommonmark/myst), and 6 useful writers
    (html5/html4/latex/xetex/man/xml) that is 36 distinct front-end tools
    we should be providing.
    
    However, only some of the combinations will be of common interest.
    We should try to find the right balance -- IMO, both extremes are
    sub-optimal.
    
    Docutils will not include dedicated front-end tools for 3rd-party
    parsers/writers/... ("pycmark2..." shall be provided by "pycmark" etc).
    
    One idea is to have two packages at pypi: "docutils-core", say,
    without dedicated front-end tools (but supporting python -m docutils)
    and "docutils" providing a sensible set of front-end tools.
    
    Another idea is to auto-install a small default set (rst2html,
    rst2latex, ...) and keep a rich set in /tools so that every user may
    install (copy, symlink or write alias commands in ~/.bashrc or
    ~/.profile) the tools the want "by hand".
    
    rst2odt_prepstyles.py is a rarely used auxiliary script.
    I propose to move it to docutils/writers/odtwriter/
    (alongside the stylefile(s) it prepares).
    
    ...
    
    Ease of discovery is important. TAB completion is a powerful means here
    
    Did you see my suggestion on using custom shell autocompletion
    functions? I believe that this would allow for tab completion with the
    reader/parser/writer flags.
    
    That is a possibility. However, it only works with some shells (bash) so it
    is not for all users.
    
    rst2 is established as the start of Docutils' front-end names for
    conversion from reStructuredText to something. I would like to keep
    this prefix as "ours".
    
    If we use what I propsed in one of my changesets to reimplement the
    rst2 commands in terms of docutils-cli, it would be entirely
    possible to deprecate the rst2 commands but just keep them forever.
    This would also mean that the simplifications I proposed at the top of
    this message wouldn't be blocked (I think).
    
    I am just working on a way to disentangle frontend.OptionParser and
    frontend.ConfigParser but this is a topic for [bugs:#441].
    
    Related
    
    Feature Requests: #110
    
    Last edit: Günter Milde 2023-06-26
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Adam Turner - 2022-01-24
      
      Even with Sphinx, some features can only be customised from a
      docutils.conf configuration file.
      
      The sttings_spec and document.settings are Docutils abstraction from
      the different configuration ways (config-files/command line/programmatic).
      Using document.settings should be possible without too much thinking
      about the actual source of the setting value.
      
      An overview for programmatic use of the "settings" framework is given in
      https://docutils.sourceforge.io/docs/api/runtime-settings.html#runtime-settings-processing-from-applications
      (best read alongside pydoc3 -b output for the mentioned functions/classes).
      
      Note I'm not proposing getting rid of the config, just loosening the direct relationship between the CLI-parsing part of Docutils and the settings/config part of Docutils.
      
      We already have the docutils.core.publish_* "convenience functions" as
      a high-level API for custom front-ends (both command-line and programmatic).
      
      Hmm, perhaps we are talking at cross purposes. I'm talking about utility functions such as "take some RST and turn in into docutils nodes" (from halfway down https://github.com/sphinx-doc/sphinx/issues/8039, ignore the emotive language).
      
      What the user probably wanted is docutils.core.publish_doctree(user_input_text).children, but it is pretty hard to know this without knowing the internals of Docutils. A function named get_nodes_from_rst (or suchlike) would be a useful helper.
      
      There is currently a great degree of useage of random internal bits of Docutils, I think partially due to that these "medium level" helpers don't exist (sorry if I wasn't clear in what I meant here in the post above).
      
      we want the components to be configurable from the command line or config file
      
      I would challenge this, I would find this very surprising behaviour if a config file (in one of at least three places, or controlled by an environment file) populated defaults to the components being used. Given it also adds a lot of complexity, I'm not sure it is worth keeping?
      
      implemented some of the abstractions and enhancements offered by Docutils in a different way.
      
      The main challenge I had here was that subclasses can filter settings_spec (through filter_settings_spec). I've never seen this implemented in the way Docutils does it before -- if settings_spec tuples were treated as immutable, then it would be much easier to e.g. construct the parser object first and then use parser.add_argument as "intended".
      
      simple front-ends in favour of one complex front end
      
      I'll try another analogy (why not!) . When I'm using ffmpeg, it is "simple" to me as the end user to know that if I want to use different input or output encodings, I just pass the relevant flag. All I need to learn is the name of the base command, and that I pass the codec I want to -c:a and -c:v. In this way it is "simpler" to remember and use as the number of commands goes up (and allows using aliases, which the per-format tools don't).
      
      The implementation might be somewhat more complex (although I would argue not much), but end-user simplicity is what counts.
      
      If you're not conviced I'll drop the issue for now, I do think it would be good to at least unify the back-end implementations of the front-end tools.
      
      two packages at pypi
      
      I don't think this is a good idea -- it increases confusion as there are two packages, but the "core" maintains all the complexity of needing to parse CLI stuff. Maybe later, if the core (or CLI) become more distinct.
      
      I propose to move it to docutils/writers/odtwriter/
      
      +1
      
      Will reply on 441 for the 441 things.
      
      A
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Günter Milde - 2022-05-12

Note I'm not proposing getting rid of the config, just loosening the
direct relationship between the CLI-parsing part of Docutils and the
settings/config part of Docutils.

We already have the docutils.core.publish_* "convenience functions" as
a high-level API for custom front-ends (both command-line and programmatic).

Hmm, perhaps we are talking at cross purposes. I'm talking about utility
functions such as "take some RST and turn in into docutils nodes"

I agree that it would be an improvement to implement config-file
processing without dependency on "optparse" or "argparse" (cf. [bugs:#441]).
Well documented utility functions are helpful, too.

However, getting rid of the rst* front-end tools does not simplify this task:
it does not matter if docutils.core.publish_cmdline() is called by one
or several command line front-end scripts.

end-user simplicity is what counts.

For end-user convenience, I see benefits in both, a generic, flexible CLI
and simple scripts for the common tasks (rst2html, rst2latex, ...).

Proposal:

Keep the "*.py" scripts in tools/ for backwards compatibility and as
examples for users wanting to create their own front-ends.

Use "entry points" [patches:#186] to install front-end scripts in
the binary PATH:

docutils: generic front end
(as "docutils-cli.py" is not installed in 0.18 [bugs:#447],
we can change to a shorter name already in 0.19).

rst2*: drop the .py suffix (after a transition period).
Eventually stop installing rarely used tools.

Related

Bugs: ~~#447~~
Feature Requests: #110
Patches: ~~#186~~

Last edit: Günter Milde 2022-12-01

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Adam Turner - 2022-05-20
  
  I agree with the sentiment here.
  
  rst2*: drop the .py suffix (after a transition period).
  Eventually stop installing rarely used tools.
  
  Sounds good, I believe that setting entry_points and scripts will install both, allowing for the transition period.
  
  A
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Günter Milde - 2022-05-12

we want the components to be configurable from the command line or config file

I would challenge this, I would find this very surprising behaviour if
a config file (in one of at least three places, or controlled by an
environment file) populated defaults to the components being used.
Given it also adds a lot of complexity, I'm not sure it is worth
keeping?

For the user, the component settings (reader, parser, writer) are
handled similar to all other settings: the "factory default" can be
customized either in a configuration file or on the command line:

Pro
Simple command call with user preferences set in a config file,
no need to type --writer=myfavourite with every call.
Consistent handling of settings.

Con
Requires 2-stage parsing of the config files.
(The command line must be parsed twice either way.)

Last edit: Günter Milde 2022-05-12

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adam Turner - 2022-05-20

Requires 2-stage parsing of the config files.

I don't think two stage parsing is that much of a downside, and the implementation of this on patches#186 seems resaonable, so I withdraw my objection.

A

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Günter Milde - 2022-12-02

Proposal

Provide entry point docutils (done in 0.19).

In Docutils 0.20, announce the move from rst2*.py scripts to rst2* entry points for 0.21 or later.

For the transition, we already provide docutils --writer=* as stable alternative to rst2*.py (documented in RELEASE-NOTES since Docutils 0.20).

Keep the "rst2*.py" scripts, "rstpep2html.py", and "buildhtml.py" in the tools/ directory of the repository and source package.

Rationale:

installing both, rst2html and rst2html.py in the binary path would complicate command line use,

in scripts, the increased verbosity of the stable command is no problem.

changing interactive usage patterns is easy

in Windows, it would be bad to first introduce the new "rst2*.py" entry points and then removing them again.

users installing from the source may install selected front-ends "manually", cf. docs/dev/repository.txt.

The attached patch provides a set of functions that are required for the rst2* entry points.
It could go into Docutils 0.20.

0001-New-functions-for-use-as-rst2-console_scripts-entry-.patch
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Günter Milde - 2023-06-26

status: open --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Günter Milde - 2023-06-26

Commit [r9408] implemented the switch from installing rst2*.py scripts into the binary PATH to "console-scripts" entry point definitions.
"Console-scripts" entry points are now:
* docutils: the generic front end
* rst2* (without extension.py): specific end-user applications.
The corresponding scripts in tools/ are kept in the repository and source distribution as examples for custom front-ends and possible use by distribution packagers.

This should also clear the way to replace "setup.py" with a TOML config file [patches:#186].

Related

Commit: [r9408]
Patches: ~~#186~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Günter Milde - 2024-04-09

Status: open-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Unify Docutils CLI tools into `docutils-cli`

Group

Searches

Help

#88 Unify Docutils CLI tools into `docutils-cli`

Related

Discussion

Related

Related

Related