Menu

#427 Inconsistent sentence spacing in man pages using rst2man.py

None
closed-fixed
nobody
2024-04-09
2021-10-13
jei23jkfd
No

Operating system info

I'm running Docutils 0.18b2.dev r8848 with Python 3.9.

Description of bug

The roff language has a very subtle requirement. It enforces semantic line breaks. That is, any roff document must end a sentence with a line break.

For example, this is incorrect:

This is a sentence. This is another sentence.

We have to insert a line break after each sentence:

This is a sentence.
This is another sentence.

The semantic line breaks are used to add optional sentence spacing. It uses the line breaks to detect when a period (or question or exclamation mark) represents the end of a sentence and then adds an optional extra space when displaying it. The relevant documentation for groff(1) and mandoc(1) is at

Minimal example

Here is a minimal example that shows how the reST man page writer has this bug:

###
mwe
###
a minimal example
#################
:Date: October 13, 2021
:Manual section: 1
:Manual group: Testing Docutils
:Version: mwe 0.1.0

Synopsis
========
| mwe [**-aq**] [**-b** *file*] [**\--long-long** *which*] *file \...*

Description
===========
To find the common attributes of a variety of objects, it is necessary
to begin, by surveying the *objects* themselves in the concrete. Let us
therefore advert successively to the various modes of action, and
arrangements of human affairs, which are classed, by universal or widely
spread opinion, as Just or as Unjust. The things well known to excite
the sentiments associated with those names, are of a very multifarious
character. I shall pass them rapidly in review, without studying any
particular arrangement.
The previous line will have been spaced with two spaces.

Options
=======
Its arguments are as follows:

-a                         Do all.
-q                         Be quiet.
-b file                    Do everything to *file*.
--long-meme which          Chooses the long named *which*.

Environment
===========
mwe is not affected by environment variables.

Exit status
===========
mwe exits 0 on success.

If you convert this with rst2man.py mwe.rst mwe.1 and view it with man ./mwe.1 you will notice the issue easily: some sentences in the DESCRIPTION section end with 1 space and some sentences end with 2 spaces.

Possible solutions

The most complete solution would be to automatically detect where sentences end in the input and add line breaks as required in the man page output. This is what Pandoc does. However, it requires keeping a list of abbreviations for every language so as to not have false positives: you don't want to add a line break when someone uses "e.g.".

Another solution would be to use the .ss macro to remove the extra space at the end of any sentences. Then Docutils wouldn't have to conform to roff syntax. However, this would not work with mandoc(1) as the developer of that program has decided not to support .ss. This is what Asciidoctor does.

Please let me know if you have any questions.

Discussion

  • Günter Milde

    Günter Milde - 2021-10-13

    A third solution/workaround is to comply with the troff line break rules
    after punctuation signs in the rST source. Use a new line to start a new
    sentence if you want double space after the full stop. Avoid a new line after punctuation signs.
    Docutils does not change the position of line breaks in text and inline
    elements (except in URIs where whitespace is trimmed).

    Another solution is configuring troff to use single spaces:

    The majority of style guides that use a Latin-derived alphabet as a
    language base now prescribe or recommend the use of a single space
    after the concluding punctuation of a sentence.

    -- https://en.wikipedia.org/wiki/Sentence_spacing_in_language_and_style_guides

    From the troff manual, I assume that .cflags 0 .?! should do the trick.

     

    Last edit: Günter Milde 2023-05-20
    • jei23jkfd

      jei23jkfd - 2021-10-15

      A third solution/workaround is to comply with the troff line break rules
      after punctuation signs in the rST source. Use a new line to start a new
      sentence if you want double space after the full stop. Atherwise, avoid a
      new line after punctuation signs.
      Docutils does not change the position of line breaks in text and inline
      elements (except in URIs where whitespace is trimmed).

      Yes, that is another solution.

      From the troff manual, I assume that .cflags 0 .?! should do the trick.

      The issue with this (and also with doing .ss \n[.ss] 0 which has the same visual effect) is that neither work with mandoc(1). It is the default man implementation on OpenBSD, FreeBSD, and NetBSD. Of course it does help on Linux and macOS, which is what most people are using.

      Would you accept a patch that adds .ss \n[.ss] 0 to the preamble of every man page?

       
      • Günter Milde

        Günter Milde - 2021-10-21

        Would you accept a patch that adds .ss \n[.ss] 0 to the preamble of every man page?

        If some post-processors fail, it would not be sensible to do this unconditionally. Maybe introduce a configuration setting "frenchspacing" or similar. This would also offer a place to document the behaviour regarding punctiation and line-breaks.

        Another option would be a generic "preamble" setting similar to the "latex_preamble"
        (https://docutils.sourceforge.io/docs/user/config.html#latex-preamble) which could also be used for other configuring tasks.

         
        • jei23jkfd

          jei23jkfd - 2021-10-29

          If some post-processors fail, it would not be sensible to do this unconditionally.

          To be clear, mandoc(1) doesn't fail, it just ignores the request.

           
  • engelbert gruber

    • labels: --> manpage-writer
     
  • engelbert gruber

    we consider every patch

    could you have a look at the sandbox/manpage-writer directory
    i have some layout tests there. do they work on BSD-flavours ,
    is there a ps-output on BSDs ?

     
  • engelbert gruber

    As the example description is typeset justified, several blanks are stretched by one blank.

           character.  I  shall  pass them rapidly in review, without studying any
           particular arrangement.  The previous line will have been  spaced  with
           two spaces.
    

    Experimenting I managed to get three blanks after the sentence end, which reduced one double blank in the following line

           character.   I  shall pass them rapidly in review, without studying any
           particular arrangement.  The previous line will have been  spaced  with
           two spaces.
    

    so most important is to not break post-processors, the layout is random anyway.
    Which might be the reason my man groff says nothing about sentences.
    man 7 groff although states, without giving a reason:

    In text paragraphs, it is advantageous to start each sentence at a line of its own.

    and

      newline
             In  text paragraphs, newlines mostly behave like space characters.
    
     
    • Günter Milde

      Günter Milde - 2023-04-20

      most important is to not break post-processors, the layout is random anyway.

      It is not completely random (even less in Postscript or PDF output where the increased spacing after sentences is clearly visible, I don't know whether/how it is preserved in HTML).

      I suggest documenting the issue of sentence spacing (recommending to follow the (g|t)roff input conventions) and then closing this bug report.

      Adding support for "raw " directive and role or a "preamble" option may be valid feature-requests.

       
  • Günter Milde

    Günter Milde - 2022-09-07
    • labels: manpage-writer --> manpage writer
     
  • G. Branden Robinson

    Hi there. I work on groff upstream; Engelbert invited me to comment.

    The most complete solution would be to automatically detect where sentences end in the input and add line breaks as required in the man page output.

    That's an AI-hard problem.

    This is what Pandoc does. However, it requires keeping a list of abbreviations for every language so as to not have false positives: you don't want to add a line break when someone uses "e.g.".

    ...for that reason, and even that can be defeated. Consider the following input.

    Many Linux distributions originate in the U.S. ls is the
    most commonly used command.
    

    As technical writing, the foregoing is poor, but it illustrates two points: sometimes sentences end with abbreviations, so you can't necessarily decide the opposite upon encountering one. And in Unix documentation, because the system is case-sensitive, we often start sentences with lowercase letters.

    Pattern matching is not powerful enough to decide sentence boundaries on conventionally written English prose.

    Another solution would be to use the .ss macro to remove the extra space at the end of any sentences.

    I object to this proposal because the amount of inter-sentence space is a matter of taste/preference--that of the reader, not the man page author. The groff_man_style(7) page includes advice for customizing the man.local file to apply single-spacing after sentences. I ask that man(7) generators not override the user's preference in this matter.

    Then Docutils wouldn't have to conform to roff syntax. However, this would not work with mandoc(1) as the developer of that program has decided not to support .ss.

    From my conversations with mandoc(1)'s developer Ingo Schwarze, I believe he is of the opinion that man(7) (and mdoc(7)) should not permit much, if any, user customization along these lines. So with that tool you'll likely get whichever amount of inter-sentence spacing the OpenBSD project decides you should have.

    This is what Asciidoctor does.

    Would someone be so kind as to direct its developers to my comments here?

    From the troff manual, I assume that .cflags 0 .?! should do the trick.

    I object to this as well, for the same reasons as injection of ss requests, and it's likely to meet the same fate when rendered by mandoc(1).

    A third solution/workaround is to comply with the troff line break rules after punctuation signs in the rST source. Use a new line to start a new sentence if you want double space after the full stop.

    This is good advice, not because it's been a best troff composition practice for about 50 years, but because it's the only truly reliable way of indicating to the machine where the sentence breaks are.

    (Barring the injection of some sort of "start-sentence" and/or "end-sentence" tags, which I cannot imagine would survive first contact with ReStructured Text's design philosophy.)

    Avoid a new line after punctuation signs.

    In troff input you don't need this rule because following an end-of-sentence punctuation character with the \& dummy character escape sequence cancels detection of the end of a sentence. I'm pretty sure you don't want to lift this syntax into Docutils; I mention it for completeness and in case it is useful to you when generating man(7) documents as output.

    Which might be the reason my man groff says nothing about sentences.
    man 7 groff although states, without giving a reason: ... and ...

    The language quoted here (which I elided with ellipses) is from the groff 1.22.4 version of the groff(7) page; groff 1.23.0, released early last July, has groff(7) and roff(7) pages that have much improved their explanations of basic troff and groff syntax. (In my opinion, that is. I wrote much of it.)

    Regards,
    Branden

     

    Last edit: G. Branden Robinson 2024-04-10
  • Günter Milde

    Günter Milde - 2024-03-13

    Thanks, Branden, for the comprehensive explanation.
    IMO, we can close this issue after adding something in the line of

       Comply with the `troff line break rules`__ after punctuation signs in
       the rST source.  Use a new line or two spaces to start a new sentence.
    
       Avoid linebreaks after punctuation signs that do not end a sentence.
       A linebreak can be escaped with a backslash.  In rST, escaped whitespace
       is removed, so precede the backslash by a space::
    
          We recommend the works of E. T. A. \
          Hoffman.
    
       __ https://www.gnu.org/software/groff/manual/groff.html#Sentences   
    

    to the Manpage Writer documentation.

     
    • engelbert gruber

      i added it to docs/users/manpage.txt

       
  • G. Branden Robinson

    Hi Günter & Engelbert,

    Your suggested language looks good to me. I don't claim a full understanding of how Python docutils converts rST to man(7), but from what I do grasp, the new advice in docs/user/manpage.txt looks like it should keep users out of trouble (that is, prevent surprise about where the sentences end). ☺

    Please consider me and the groff mailing list a resource for any further questions about *roff or man.

    Regards,
    Branden

     
  • Günter Milde

    Günter Milde - 2024-03-20

    Another solution would be to use the .ss macro to remove the extra space at the end of any sentences.

    I object to this proposal because the amount of inter-sentence space is a matter of taste/preference--that of the reader, not the man page author. The groff_man_style(7) page includes advice for customizing the man.local file to apply single-spacing after sentences. I ask that man(7) generators not override the user's preference in this matter.

    While this is reasonable for locales that have the tradition of extra large inter-sentence space,
    it makes live hard for document authors from locales where "frenchspacing" is the norm (and alsways was) like German. Writing a German text, I usually don't care about sentence spacing and any additional space after a full stop at the end of a line comes as an unwelcome surprise.

    "frenchspacing" with groff can be achieved by preceding the source with a "raw" directive:

    .. raw:: manpage
    
       .cflags 0 .?! 
    

    The raw text is inserted after the title and docinfo but this should suffice (unless there is a two-sentence subtitle).

    An option for "frenchspacing" would make this more obvious for authors that don't know about (or object to) the extra space after sentences. It would be, however, an additional feature, not a bugfix. Whether it would really be an improvement is open for discussion.

     

    Last edit: Günter Milde 2024-03-20
    • G. Branden Robinson

      Hi Günter,

      While [using the roff ss macro to assign an amount of inter-sentence space in man pages generated by docutils ] is reasonable for locales that have the tradition of extra large inter-sentence space, it makes live hard for document authors from locales where "frenchspacing" is the norm (and alsways was) like German. Writing a German text, I usually don't care about sentence spacing and any additional space after a full stop at the end of a line comes as an unwelcome surprise.

      In groff, this is already taken care of for you. Its localization files for Czech, German, Spanish (forthcoming in 1.24), French, Italian, Russian (forthcoming in 1.24) and Swedish all use what TeX calls "frenchspacing".

      See:

      https://git.savannah.gnu.org/cgit/groff.git/tree/tmac/cs.tmac?h=1.23.0#n158
      https://git.savannah.gnu.org/cgit/groff.git/tree/tmac/de.tmac?h=1.23.0#n158
      https://git.savannah.gnu.org/cgit/groff.git/tree/tmac/fr.tmac?h=1.23.0#n158

      ...and so forth.

      An option for "frenchspacing" would come handy for authors that don't know about (or object to) the extra space after sentences. It would be, however, an additional feature, not a bugfix.

      I don't think this is necessary for the man(7) output of docutils. groff will produce the locally-correct amount of inter-sentence space regardless, and if the user has a different preference, the man.local mechanism for overriding it has been in place for over 30 years, since before groff 1.06 in 1992 (which is as far back as our Git repository has history).

      I do not want groff to impose English-specific typographical practices on people while denying the opportunity for configuration, and I do not think it does in this respect.

      Regards,
      Branden

       
      • Günter Milde

        Günter Milde - 2024-04-07

        Thank you for the explanation. We will close this bug as "fixed" (via the new documentation) once 0.21 is out.

         
  • Günter Milde

    Günter Milde - 2024-03-20
    • status: open --> pending-works-for-me
     
  • Günter Milde

    Günter Milde - 2024-04-09
    • status: pending-works-for-me --> closed-fixed
     
  • Günter Milde

    Günter Milde - 2024-04-09

    The issue/feature is documented in Docutils 0.21.

     

Log in to post a comment.