Menu

#88 Unify Docutils CLI tools into `docutils-cli`

Default
closed-fixed
nobody
None
5
2024-04-09
2022-01-15
Adam Turner
No

As noted at https://sourceforge.net/p/docutils/patches/186/?page=1#897a/547e/ef2d by @milde,

we should open a new ticket for the command line tool review

This is a tracker issue for this, and to allow discussion.

I'll briefly re-outline my argument to (eventually) drop the rst* front-end tools, and only export docutils-cli (or python -m docutils).

I think a single front-end tool significantly simplifies a lot of things -- the docutils-cli wrapper is not complex, which gives it significant points in favour in my book.

Most usage of Docutils today is programmatic, and not via the command line tools (see the table at the bottom of this post - it shows all the projects that have a full dependency on Docutils with over 500k downloads in the last month. Of those 8, none use the command line tools)

I also suspect (although the data does not exist) that most command line uses of the Docutils tools will be rst2html(5). This is already the default in docutils-cli, so it is a drop-in replacement.

...

My proposal isn't to remove them [the rst2 front-end tools] with no recourse, but to deprecate over a period of time, clearly marking identical drop-in commands at runtime to affected users. ... We cannot know how many people would be affected with local random scripts, but it is a two-second change.

Many users will also run with old or pinned versions of Docutils, and part of updating is seeing the changelog. If Debian or other redistributors already make changes, they could decide to keep shell aliases from rst2* to the new docutils-cli based invocations.

(quotes taken from https://sourceforge.net/p/docutils/patches/186/#897a and https://sourceforge.net/p/docutils/patches/186/?page=1#897a/547e )

A

Related

Bugs: #447

Discussion

  • engelbert gruber

    working on the commandline means tipping less by using completion

    if i type rstTABTAB the list of all writers shows up
    if there is only docutils-cli i have to read the documentation.
    if there happens to be a new writer with rst.... i will be notified by the completion result,
    if there is only docutils-cli i have to read the documentation.

    of course new readers wont show up for rstTABTAB

     
    • Adam  Turner

      Adam Turner - 2022-01-16

      of course new readers wont show up for rstTABTAB

      I think this partly speaks to the issue -- the tab completion functionality only works "by accident", and doesn't support readers/parsers.

      It seems it might be possible to add custom bash autocomplete rules (https://caliban.org/bash/#completion) -- would this be an acceptable workaround?

      A

       
  • Günter Milde

    Günter Milde - 2022-01-17

    While I am in favour of revising and updating Docutils' command line
    entry points, I don't think we should drop the number down to one.

    I'll briefly re-outline my argument to (eventually) drop the rst*
    front-end tools, and only export docutils-cli (or python -m docutils).

    I think a single front-end tool significantly simplifies a lot of
    things

    Can you elaborate a bit on what would become a lot simpler?

    the docutils-cli wrapper is not complex, which gives it
    significant points in favour in my book.

    The generic front-end is one order of magnitude more complex because of
    the two-stage command line parsing with the set of valid tags depending
    on the components selected.
    Even help output depends on the "component" tags. Due to the open nature
    (allowing for plug-in components), a man page will always need to refer
    to external documentation, while, e.g, man rst2html lists all available
    command line options (at least on Debian).

    Most usage of Docutils today is programmatic, and not via the command
    line tools

    We need to care for "command line users" if their number is
    non-negligible -- independent of the number of users depending on the
    programmatic interface.

    The number of users/projects using Docutils via the command line interface
    cannot be estimated by looking at Python projects.
    Unfortunately, it is rather hard to find out how many non-Python projects
    uses "rst2html.py" in their Makefile or another form of build tool chain.

    The first answer to Explain Python entry points?
    even cites Doctils as

    ... a great example of entry-point use: it will install something like a
    half-dozen useful commands for converting Python documentation to other
    formats.

    (even if Docutils currently does not use the "console-scripts mechanism" to
    provide cli entry points).

    ...

    We cannot know how many people would be affected with local random
    scripts, but it is a two-second change.

    While the actual re-typing (or drag-and-drop) of the command may be that
    fast, this is not the case for the complete task of finding out and
    approaching the right spot where to apply the change in a complex build
    chain.

    Many users will also run with old or pinned versions of Docutils, and
    part of updating is seeing the changelog.

    A hard learned lesson from Docutils releases is to never underestimate
    the number of users/project managers that don't read the changelog (nor
    the announcements in the RELEASE-NOTES) yet depend on a stable Docutils
    for a stable system.


    I am pro change for instances where the current
    naming is unfortunate or may stand in the way.

    buildhtml.py is too generic, it may stand in the way.
    Debian calls it rst-buildhtml. I could imagine docutils-buildhtml or
    leaving it in the tools for individual installing.

    docutils-cli.py is too long. This name was selected because a naming
    the file for the generic front end tool "docutils.py" is misleading.
    With "entry points" it is possible to use docutils as front-end command
    without the need for a file "docutils.py".

    python3 -m docutils currently results in the error:
    'docutils' is a package and cannot be directly executed
    It could be made more helpful, we know, a user typing python -m ... wants
    to execute a command line tool (or just wants to know more about docutils).

    rst2 is established as the start of Docutils' front-end names for
    conversion from reStructuredText to something. I would like to keep this
    prefix as "ours". (After all, Docutils is the reference implementation of the
    rST format.)

    Ease of discovery is important. TAB completion is a powerfull means here.
    Additional parser or readers may add their own entry points, cf.
    https://github.com/executablebooks/MyST-Parser/issues/347#issuecomment-1003717830

    Rarely used and diagnostic tools may not need automatic installation into
    the binary PATH. Here, it may help to diagnose which tools are installed by
    pip docutils vs. OS-specific package managers.

    Debian installs the following 13:

    rst2html
    rst2html4
    rst2html5
    rst2latex
    rst2man
    rst2odt
    rst2odt_prepstyles
    rst2pseudoxml
    rst2s5
    rst2xetex
    rst2xml
    rst-buildhtml
    rstpep2html

    Dropping the .py from rst2*.py commands may be considered.

    +1 shorter and more command-like names
    -1 backwards incompatible, an unknown number of users need to change their scripts.

     
    • Adam  Turner

      Adam Turner - 2022-01-20

      Can you elaborate a bit on what would become a lot simpler?

      Currently deep in Docutils' internals (everywhere that takes a settings_spec or uses self.settings sort of assumes working as a command line programme. However, a lot of usage (programmatic, through Sphinx or other methods) entirely use the default values for things. By moving to a single front end I would argue it is not only a cleaner user story, but it might enable refactoring to move the CLI usages of Docutils to a higher level.

      Currently we need to do awful things to subclass and patch either optparse.OptionParser or argparse.ArgumentParser. This is really unusual, and for developers coming from a more "normal" command line application, it can take a while to understand this part of the internals of Docutils.

      I didn't go into detail intentionally so as not to spark a debate about these parts, but I do think (eventually) simplifying these interactions can lead to a cleaner codebase.

      the two-stage command line parsing

      I don't think you can get away from this though without a combinatorial explosion of readers, writers, and parsers. Say we have two useful CLI readers (standalone/pep), three parsers (rst/recommonmark/myst), and 6 useful writers (html5/html4/latex/xetex/man/xml) that is 36 distinct front-end tools we should be providing.

      a man page will always need to refer to external documentation

      I will admit ignorance on how man pages work. docutils-cli --writer xetex --help, though, will always give the correct help output. This is also the version we should be promoting, not least as it works cross platform (if my patch with entrypoints is merged!).

      We need to care for "command line users" if their number is non-negligible

      Of course -- sorry if my post came across as callous in any way towards frontend tool users. I suppose what I don't want is to be in a situation where we are not making real improvements based on hypothetical situations. It might be useful to find ways of proxying for CLI usage -- bugs filed recently with us/redistributors, usages in public archives ( https://grep.app or similar ), etc.

      rather hard to find out how many non-Python projects uses "rst2html.py" in their Makefile or another form of build tool chain

      True. However by the above methods we can get an estimate, surely? There are a lot of people who commit random things to GitHub / GitLab / whatever!

      finding out and approaching the right spot where to apply the change

      This is why I proposed to go about it by emitting warnings during deprecation, before total removal. We also need to consider the support that this project offers -- if a downstream user has integrated Docutils into a complex tool chain and cannot maintain it, we shouldn't be responsible for that.

      never underestimate the number of users/project managers that don't read the changelog ... yet depend on a stable Docutils for a stable system.

      Fair enough -- though perhaps another route we could go down in the deprecation notices are to say "pin version XX". There is no best solution here -- all change will break someone's workflow (XKCD 1172!), but we should be working to make the upgrade path as easy as possible.

      buildhtml

      Ahh, I was under the impression that buildhtml was an internal tool for building the website. Would it be reasonable to formally retire it from public use, and reccommend Sphinx as an alternative?

      use docutils as front-end command // python -m docutils

      +1

      Ease of discovery is important. TAB completion is a powerfull means here

      Did you see my suggestion on using custom shell autocompletion functions? I believe that this would allow for tab completion with the reader/parser/writer flags.

      rst2 is established as the start of Docutils' front-end names for conversion from reStructuredText to something. I would like to keep this prefix as "ours".

      If we use what I propsed in one of my changesets to reimplement the rst2 commands in terms of docutils-cli, it would be entirely possible to deprecate the rst2 commands but just keep them forever. This would also mean that the simplifications I proposed at the top of this message wouldn't be blocked (I think).


      Concrete proposal:

      • Promote docutils or python -m docutils where we currently reference rst2
      • Reimplement the rst2* commands in terms of docutils-cli
      • Try to implement <TAB> autocompletion for docutils-cli
      • Use entrypoints for everything (but also keep .py aliases for a while)
      • Deprecate rst2* commands, but with no removal date

      A

       
      • Günter Milde

        Günter Milde - 2022-01-22

        Currently deep in Docutils' internals (everywhere that takes a
        settings_spec or uses self.settings sort of assumes working as a
        command line programme. However, a lot of usage (programmatic, through
        Sphinx or other methods) entirely use the default values for things.

        Even with Sphinx, some features can only be customised from a
        docutils.conf configuration file.

        The sttings_spec and document.settings are Docutils abstraction from
        the different configuration ways (config-files/command line/programmatic).
        Using document.settings should be possible without too much thinking
        about the actual source of the setting value.

        An overview for programmatic use of the "settings" framework is given in
        https://docutils.sourceforge.io/docs/api/runtime-settings.html#runtime-settings-processing-from-applications
        (best read alongside pydoc3 -b output for the mentioned functions/classes).

        By moving to a single front end I would argue it is not only a cleaner
        user story, but it might enable refactoring to move the CLI usages of
        Docutils to a higher level.

        We already have the docutils.core.publish_* "convenience functions" as
        a high-level API for custom front-ends (both command-line and programmatic).

        "docutils-cli" is more complex because here we want the components to be
        configurable from the command line or config file. I am working on moving
        the complexity to a library function that can be re-used by other
        "script" entry-points in need of configurable components. This will become
        an extension or addition to docutils.core.publish_cmdline().
        (It may also become simpler once "optparse" is replaced with "argparse".)

        Currently we need to do awful things to subclass and patch either
        optparse.OptionParser or argparse.ArgumentParser. This is really
        unusual, and for developers coming from a more "normal" command line
        application, it can take a while to understand this part of the
        internals of Docutils.

        Yes, indeed. Docutils has an elaborated configuration framework which
        actually predates the "optparse" module. Later development of "Optik"
        into "optparse" and then "argparse" implemented some of the abstractions and
        enhancements offered by Docutils in a different way.

        But (in contrast to developers working on the optparse->argparse
        transition :) "normal" developers using the "docutils" package don't need
        to care about the details here. They can use the high-level API offered
        by SettingsSpec / settings and get the command line and config file
        processing for free (docutils.frontend is only the "workhorse",
        docutils.core is the high-level interface).
        I agree that there is room for improvement in this API, but I don't think
        getting rid of the simple front-ends in favour of one complex front end
        will be of much help in this quest.

        I didn't go into detail intentionally so as not to spark a debate about
        these parts, but I do think (eventually) simplifying these interactions
        can lead to a cleaner codebase.

        I suggest moving this thread of the discussion over to [bugs:#441].

        the two-stage command line parsing

        I don't think you can get away from this though without a combinatorial
        explosion of readers, writers, and parsers.
        Say we have two useful CLI readers (standalone/pep), three parsers
        (rst/recommonmark/myst), and 6 useful writers
        (html5/html4/latex/xetex/man/xml) that is 36 distinct front-end tools
        we should be providing.

        However, only some of the combinations will be of common interest.
        We should try to find the right balance -- IMO, both extremes are
        sub-optimal.

        • Docutils will not include dedicated front-end tools for 3rd-party
          parsers/writers/... ("pycmark2..." shall be provided by "pycmark" etc).

        • One idea is to have two packages at pypi: "docutils-core", say,
          without dedicated front-end tools (but supporting python -m docutils)
          and "docutils" providing a sensible set of front-end tools.

        • Another idea is to auto-install a small default set (rst2html,
          rst2latex, ...) and keep a rich set in /tools so that every user may
          install (copy, symlink or write alias commands in ~/.bashrc or
          ~/.profile) the tools the want "by hand".

        • rst2odt_prepstyles.py is a rarely used auxiliary script.
          I propose to move it to docutils/writers/odtwriter/
          (alongside the stylefile(s) it prepares).

        ...

        Ease of discovery is important. TAB completion is a powerful means here

        Did you see my suggestion on using custom shell autocompletion
        functions? I believe that this would allow for tab completion with the
        reader/parser/writer flags.

        That is a possibility. However, it only works with some shells (bash) so it
        is not for all users.

        rst2 is established as the start of Docutils' front-end names for
        conversion from reStructuredText to something. I would like to keep
        this prefix as "ours".

        If we use what I propsed in one of my changesets to reimplement the
        rst2 commands in terms of docutils-cli, it would be entirely
        possible to deprecate the rst2 commands but just keep them forever.
        This would also mean that the simplifications I proposed at the top of
        this message wouldn't be blocked (I think).

        I am just working on a way to disentangle frontend.OptionParser and
        frontend.ConfigParser but this is a topic for [bugs:#441].

         

        Related

        Feature Requests: #110


        Last edit: Günter Milde 2023-06-26
        • Adam  Turner

          Adam Turner - 2022-01-24

          Even with Sphinx, some features can only be customised from a
          docutils.conf configuration file.

          The sttings_spec and document.settings are Docutils abstraction from
          the different configuration ways (config-files/command line/programmatic).
          Using document.settings should be possible without too much thinking
          about the actual source of the setting value.

          An overview for programmatic use of the "settings" framework is given in
          https://docutils.sourceforge.io/docs/api/runtime-settings.html#runtime-settings-processing-from-applications
          (best read alongside pydoc3 -b output for the mentioned functions/classes).

          Note I'm not proposing getting rid of the config, just loosening the direct relationship between the CLI-parsing part of Docutils and the settings/config part of Docutils.

          We already have the docutils.core.publish_* "convenience functions" as
          a high-level API for custom front-ends (both command-line and programmatic).

          Hmm, perhaps we are talking at cross purposes. I'm talking about utility functions such as "take some RST and turn in into docutils nodes" (from halfway down https://github.com/sphinx-doc/sphinx/issues/8039, ignore the emotive language).

          What the user probably wanted is docutils.core.publish_doctree(user_input_text).children, but it is pretty hard to know this without knowing the internals of Docutils. A function named get_nodes_from_rst (or suchlike) would be a useful helper.

          There is currently a great degree of useage of random internal bits of Docutils, I think partially due to that these "medium level" helpers don't exist (sorry if I wasn't clear in what I meant here in the post above).

          we want the components to be configurable from the command line or config file

          I would challenge this, I would find this very surprising behaviour if a config file (in one of at least three places, or controlled by an environment file) populated defaults to the components being used. Given it also adds a lot of complexity, I'm not sure it is worth keeping?

          implemented some of the abstractions and enhancements offered by Docutils in a different way.

          The main challenge I had here was that subclasses can filter settings_spec (through filter_settings_spec). I've never seen this implemented in the way Docutils does it before -- if settings_spec tuples were treated as immutable, then it would be much easier to e.g. construct the parser object first and then use parser.add_argument as "intended".

          simple front-ends in favour of one complex front end

          I'll try another analogy (why not!) . When I'm using ffmpeg, it is "simple" to me as the end user to know that if I want to use different input or output encodings, I just pass the relevant flag. All I need to learn is the name of the base command, and that I pass the codec I want to -c:a and -c:v. In this way it is "simpler" to remember and use as the number of commands goes up (and allows using aliases, which the per-format tools don't).

          The implementation might be somewhat more complex (although I would argue not much), but end-user simplicity is what counts.

          If you're not conviced I'll drop the issue for now, I do think it would be good to at least unify the back-end implementations of the front-end tools.

          two packages at pypi

          I don't think this is a good idea -- it increases confusion as there are two packages, but the "core" maintains all the complexity of needing to parse CLI stuff. Maybe later, if the core (or CLI) become more distinct.

          I propose to move it to docutils/writers/odtwriter/

          +1

          Will reply on 441 for the 441 things.

          A

           
  • Günter Milde

    Günter Milde - 2022-05-12

    Note I'm not proposing getting rid of the config, just loosening the
    direct relationship between the CLI-parsing part of Docutils and the
    settings/config part of Docutils.

    We already have the docutils.core.publish_* "convenience functions" as
    a high-level API for custom front-ends (both command-line and programmatic).

    Hmm, perhaps we are talking at cross purposes. I'm talking about utility
    functions such as "take some RST and turn in into docutils nodes"

    I agree that it would be an improvement to implement config-file
    processing without dependency on "optparse" or "argparse" (cf. [bugs:#441]).
    Well documented utility functions are helpful, too.

    However, getting rid of the rst* front-end tools does not simplify this task:
    it does not matter if docutils.core.publish_cmdline() is called by one
    or several command line front-end scripts.

    end-user simplicity is what counts.

    For end-user convenience, I see benefits in both, a generic, flexible CLI
    and simple scripts for the common tasks (rst2html, rst2latex, ...).

    Proposal:

    Keep the "*.py" scripts in tools/ for backwards compatibility and as
    examples for users wanting to create their own front-ends.

    Use "entry points" [patches:#186] to install front-end scripts in
    the binary PATH:

    docutils: generic front end
    (as "docutils-cli.py" is not installed in 0.18 [bugs:#447],
    we can change to a shorter name already in 0.19).

    rst2*: drop the .py suffix (after a transition period).
    Eventually stop installing rarely used tools.

     

    Related

    Bugs: #447
    Feature Requests: #110
    Patches: #186


    Last edit: Günter Milde 2022-12-01
    • Adam  Turner

      Adam Turner - 2022-05-20

      I agree with the sentiment here.

      rst2*: drop the .py suffix (after a transition period).
      Eventually stop installing rarely used tools.

      Sounds good, I believe that setting entry_points and scripts will install both, allowing for the transition period.

      A

       
  • Günter Milde

    Günter Milde - 2022-05-12

    we want the components to be configurable from the command line or config file

    I would challenge this, I would find this very surprising behaviour if
    a config file (in one of at least three places, or controlled by an
    environment file) populated defaults to the components being used.
    Given it also adds a lot of complexity, I'm not sure it is worth
    keeping?

    For the user, the component settings (reader, parser, writer) are
    handled similar to all other settings: the "factory default" can be
    customized either in a configuration file or on the command line:

    Pro
    Simple command call with user preferences set in a config file,
    no need to type --writer=myfavourite with every call.
    Consistent handling of settings.

    Con
    Requires 2-stage parsing of the config files.
    (The command line must be parsed twice either way.)

     

    Last edit: Günter Milde 2022-05-12
  • Adam  Turner

    Adam Turner - 2022-05-20

    Requires 2-stage parsing of the config files.

    I don't think two stage parsing is that much of a downside, and the implementation of this on patches#186 seems resaonable, so I withdraw my objection.

    A

     
  • Günter Milde

    Günter Milde - 2022-12-02

    Proposal

    • Provide entry point docutils (done in 0.19).
    • In Docutils 0.20, announce the move from rst2*.py scripts to rst2* entry points for 0.21 or later.
    • For the transition, we already provide docutils --writer=* as stable alternative to rst2*.py (documented in RELEASE-NOTES since Docutils 0.20).
    • Keep the "rst2*.py" scripts, "rstpep2html.py", and "buildhtml.py" in the tools/ directory of the repository and source package.

    Rationale:

    • installing both, rst2html and rst2html.py in the binary path would complicate command line use,
    • in scripts, the increased verbosity of the stable command is no problem.
    • changing interactive usage patterns is easy
    • in Windows, it would be bad to first introduce the new "rst2*.py" entry points and then removing them again.
    • users installing from the source may install selected front-ends "manually", cf. docs/dev/repository.txt.

    The attached patch provides a set of functions that are required for the rst2* entry points.
    It could go into Docutils 0.20.

     
  • Günter Milde

    Günter Milde - 2023-06-26
    • status: open --> open-fixed
     
  • Günter Milde

    Günter Milde - 2023-06-26

    Commit [r9408] implemented the switch from installing rst2*.py scripts into the binary PATH to "console-scripts" entry point definitions.
    "Console-scripts" entry points are now:
    * docutils: the generic front end
    * rst2* (without extension.py): specific end-user applications.
    The corresponding scripts in tools/ are kept in the repository and source distribution as examples for custom front-ends and possible use by distribution packagers.

    This should also clear the way to replace "setup.py" with a TOML config file [patches:#186].

     

    Related

    Commit: [r9408]
    Patches: #186

  • Günter Milde

    Günter Milde - 2024-04-09
    • Status: open-fixed --> closed-fixed
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.