I'm running Docutils 0.18b2.dev r8848 with Python 3.9.
The roff language has a very subtle requirement. It enforces semantic line breaks. That is, any roff document must end a sentence with a line break.
For example, this is incorrect:
This is a sentence. This is another sentence.
We have to insert a line break after each sentence:
This is a sentence.
This is another sentence.
The semantic line breaks are used to add optional sentence spacing. It uses the line breaks to detect when a period (or question or exclamation mark) represents the end of a sentence and then adds an optional extra space when displaying it. The relevant documentation for groff(1) and mandoc(1) is at
Here is a minimal example that shows how the reST man page writer has this bug:
###
mwe
###
a minimal example
#################
:Date: October 13, 2021
:Manual section: 1
:Manual group: Testing Docutils
:Version: mwe 0.1.0
Synopsis
========
| mwe [**-aq**] [**-b** *file*] [**\--long-long** *which*] *file \...*
Description
===========
To find the common attributes of a variety of objects, it is necessary
to begin, by surveying the *objects* themselves in the concrete. Let us
therefore advert successively to the various modes of action, and
arrangements of human affairs, which are classed, by universal or widely
spread opinion, as Just or as Unjust. The things well known to excite
the sentiments associated with those names, are of a very multifarious
character. I shall pass them rapidly in review, without studying any
particular arrangement.
The previous line will have been spaced with two spaces.
Options
=======
Its arguments are as follows:
-a Do all.
-q Be quiet.
-b file Do everything to *file*.
--long-meme which Chooses the long named *which*.
Environment
===========
mwe is not affected by environment variables.
Exit status
===========
mwe exits 0 on success.
If you convert this with rst2man.py mwe.rst mwe.1 and view it with man ./mwe.1 you will notice the issue easily: some sentences in the DESCRIPTION section end with 1 space and some sentences end with 2 spaces.
The most complete solution would be to automatically detect where sentences end in the input and add line breaks as required in the man page output. This is what Pandoc does. However, it requires keeping a list of abbreviations for every language so as to not have false positives: you don't want to add a line break when someone uses "e.g.".
Another solution would be to use the .ss macro to remove the extra space at the end of any sentences. Then Docutils wouldn't have to conform to roff syntax. However, this would not work with mandoc(1) as the developer of that program has decided not to support .ss. This is what Asciidoctor does.
Please let me know if you have any questions.
A third solution/workaround is to comply with the troff line break rules
after punctuation signs in the rST source. Use a new line to start a new
sentence if you want double space after the full stop. Avoid a new line after punctuation signs.
Docutils does not change the position of line breaks in text and inline
elements (except in URIs where whitespace is trimmed).
Another solution is configuring troff to use single spaces:
The majority of style guides that use a Latin-derived alphabet as a
language base now prescribe or recommend the use of a single space
after the concluding punctuation of a sentence.
-- https://en.wikipedia.org/wiki/Sentence_spacing_in_language_and_style_guides
From the troff manual, I assume that
.cflags 0 .?!should do the trick.Last edit: Günter Milde 2023-05-20
Yes, that is another solution.
The issue with this (and also with doing
.ss \n[.ss] 0which has the same visual effect) is that neither work with mandoc(1). It is the defaultmanimplementation on OpenBSD, FreeBSD, and NetBSD. Of course it does help on Linux and macOS, which is what most people are using.Would you accept a patch that adds
.ss \n[.ss] 0to the preamble of every man page?If some post-processors fail, it would not be sensible to do this unconditionally. Maybe introduce a configuration setting "frenchspacing" or similar. This would also offer a place to document the behaviour regarding punctiation and line-breaks.
Another option would be a generic "preamble" setting similar to the "latex_preamble"
(https://docutils.sourceforge.io/docs/user/config.html#latex-preamble) which could also be used for other configuring tasks.
To be clear, mandoc(1) doesn't fail, it just ignores the request.
we consider every patch
could you have a look at the sandbox/manpage-writer directory
i have some layout tests there. do they work on BSD-flavours ,
is there a ps-output on BSDs ?
As the example description is typeset justified, several blanks are stretched by one blank.
Experimenting I managed to get three blanks after the sentence end, which reduced one double blank in the following line
so most important is to not break post-processors, the layout is random anyway.
Which might be the reason my man groff says nothing about sentences.
man 7 groff although states, without giving a reason:
and
It is not completely random (even less in Postscript or PDF output where the increased spacing after sentences is clearly visible, I don't know whether/how it is preserved in HTML).
I suggest documenting the issue of sentence spacing (recommending to follow the (g|t)roff input conventions) and then closing this bug report.
Adding support for "raw " directive and role or a "preamble" option may be valid feature-requests.
Hi there. I work on groff upstream; Engelbert invited me to comment.
That's an AI-hard problem.
...for that reason, and even that can be defeated. Consider the following input.
As technical writing, the foregoing is poor, but it illustrates two points: sometimes sentences end with abbreviations, so you can't necessarily decide the opposite upon encountering one. And in Unix documentation, because the system is case-sensitive, we often start sentences with lowercase letters.
Pattern matching is not powerful enough to decide sentence boundaries on conventionally written English prose.
I object to this proposal because the amount of inter-sentence space is a matter of taste/preference--that of the reader, not the man page author. The groff_man_style(7) page includes advice for customizing the man.local file to apply single-spacing after sentences. I ask that man(7) generators not override the user's preference in this matter.
From my conversations with mandoc(1)'s developer Ingo Schwarze, I believe he is of the opinion that man(7) (and mdoc(7)) should not permit much, if any, user customization along these lines. So with that tool you'll likely get whichever amount of inter-sentence spacing the OpenBSD project decides you should have.
Would someone be so kind as to direct its developers to my comments here?
I object to this as well, for the same reasons as injection of
ssrequests, and it's likely to meet the same fate when rendered by mandoc(1).This is good advice, not because it's been a best troff composition practice for about 50 years, but because it's the only truly reliable way of indicating to the machine where the sentence breaks are.
(Barring the injection of some sort of "start-sentence" and/or "end-sentence" tags, which I cannot imagine would survive first contact with ReStructured Text's design philosophy.)
In troff input you don't need this rule because following an end-of-sentence punctuation character with the
\&dummy character escape sequence cancels detection of the end of a sentence. I'm pretty sure you don't want to lift this syntax into Docutils; I mention it for completeness and in case it is useful to you when generating man(7) documents as output.The language quoted here (which I elided with ellipses) is from the groff 1.22.4 version of the groff(7) page; groff 1.23.0, released early last July, has groff(7) and roff(7) pages that have much improved their explanations of basic troff and groff syntax. (In my opinion, that is. I wrote much of it.)
Regards,
Branden
Last edit: G. Branden Robinson 2024-04-10
Thanks, Branden, for the comprehensive explanation.
IMO, we can close this issue after adding something in the line of
to the Manpage Writer documentation.
i added it to docs/users/manpage.txt
Hi Günter & Engelbert,
Your suggested language looks good to me. I don't claim a full understanding of how Python docutils converts rST to man(7), but from what I do grasp, the new advice in docs/user/manpage.txt looks like it should keep users out of trouble (that is, prevent surprise about where the sentences end). ☺
Please consider me and the groff mailing list a resource for any further questions about *roff or man.
Regards,
Branden
While this is reasonable for locales that have the tradition of extra large inter-sentence space,
it makes live hard for document authors from locales where "frenchspacing" is the norm (and alsways was) like German. Writing a German text, I usually don't care about sentence spacing and any additional space after a full stop at the end of a line comes as an unwelcome surprise.
"frenchspacing" with groff can be achieved by preceding the source with a "raw" directive:
The raw text is inserted after the title and docinfo but this should suffice (unless there is a two-sentence subtitle).
An option for "frenchspacing" would make this more obvious for authors that don't know about (or object to) the extra space after sentences. It would be, however, an additional feature, not a bugfix. Whether it would really be an improvement is open for discussion.
Last edit: Günter Milde 2024-03-20
Hi Günter,
In groff, this is already taken care of for you. Its localization files for Czech, German, Spanish (forthcoming in 1.24), French, Italian, Russian (forthcoming in 1.24) and Swedish all use what TeX calls "frenchspacing".
See:
https://git.savannah.gnu.org/cgit/groff.git/tree/tmac/cs.tmac?h=1.23.0#n158
https://git.savannah.gnu.org/cgit/groff.git/tree/tmac/de.tmac?h=1.23.0#n158
https://git.savannah.gnu.org/cgit/groff.git/tree/tmac/fr.tmac?h=1.23.0#n158
...and so forth.
I don't think this is necessary for the man(7) output of docutils. groff will produce the locally-correct amount of inter-sentence space regardless, and if the user has a different preference, the man.local mechanism for overriding it has been in place for over 30 years, since before groff 1.06 in 1992 (which is as far back as our Git repository has history).
I do not want groff to impose English-specific typographical practices on people while denying the opportunity for configuration, and I do not think it does in this respect.
Regards,
Branden
Thank you for the explanation. We will close this bug as "fixed" (via the new documentation) once 0.21 is out.
The issue/feature is documented in Docutils 0.21.