From: G. B. R. <br...@us...> - 2024-03-09 19:14:27
|
Hi there. I work on _groff_ upstream; Engelbert invited me to comment. > The most complete solution would be to automatically detect where sentences end in the input and add line breaks as required in the man page output. That's an AI-hard problem. > This is what Pandoc does. However, it requires keeping a list of abbreviations for every language so as to not have false positives: you don't want to add a line break when someone uses "e.g.". ...for that reason, and even that can be defeated. Consider the following input. ~~~ Many Linux distributions originate in the U.S. ls is the most commonly used command. ~~~ As technical writing, the foregoing is poor, but it illustrates two points: sometimes sentences end with abbreviations, so you can't necessarily decide the opposite upon encountering one. And in Unix documentation, because the system is case-sensitive, we often start sentences with lowercase letters. Pattern matching is not powerful enough to decide sentence boundaries. > Another solution would be to use the .ss macro to remove the extra space at the end of any sentences. I object to this proposal because the amount of inter-sentence space is a matter of taste/preference--that of the _reader_, not the man page author. The _groff_man_style_(7) page includes advice for customizing the _man.local_ file to apply single-spacing after sentences. I ask that _man_(7) generators not override the user's preference in this matter. > Then Docutils wouldn't have to conform to roff syntax. However, this would not work with mandoc(1) as the developer of that program has decided not to support .ss. >From my conversations with _mandoc_(1)'s developer Ingo Schwarze, I believe he is of the opinion that _man_(7) (and _mdoc_(7)) should not permit much, if any, user customization along these lines. So with that tool you'll likely get whichever amount of inter-sentence spacing the OpenBSD project decides you should have. > This is what Asciidoctor does. Would someone be so kind as to direct its developers to my comments here? > From the troff manual, I assume that .cflags 0 .?! should do the trick. I object to this as well, for the same reasons as injection of `ss` requests, and it's likely to meet the same fate when rendered by _mandoc_(1). > A third solution/workaround is to comply with the troff line break rules after punctuation signs in the rST source. Use a new line to start a new sentence if you want double space after the full stop. This is good advice, not because it's been a best _troff_ composition practice for about 50 years, but because it's the only truly reliable way of indicating to the machine where the sentence breaks are. (Barring the injection of some sort of "start-sentence" and/or "end-sentence" tags, which I cannot imagine would survive first contact with ReStructured Text's design philosophy.) > Avoid a new line after punctuation signs. In _troff_ input you don't need this rule because following an end-of-sentence punctuation character with the `\&` dummy character escape sequence cancels detection of the end of a sentence. I'm pretty sure you don't want to lift this syntax into Docutils; I mention it for completeness and in case it is useful to you when generating _man_(7) documents as output. > Which might be the reason my man groff says nothing about sentences. man 7 groff although states, without giving a reason: ... and ... The language quoted here (which I elided with ellipses) is from the _groff_ 1.22.4 version of the _groff_(7) page; _groff_ 1.23.0, released early last July, has _groff_(7) and _roff_(7) pages have much improved their explanations of basic _troff_ and _groff_ syntax. (In my opinion, that is. I wrote much of it.) Regards, Branden --- **[bugs:#427] Inconsistent sentence spacing in man pages using rst2man.py** **Status:** open **Labels:** manpage writer **Created:** Wed Oct 13, 2021 07:00 PM UTC by jei23jkfd **Last Updated:** Thu Apr 20, 2023 11:23 AM UTC **Owner:** nobody ## Operating system info I'm running Docutils 0.18b2.dev r8848 with Python 3.9. ## Description of bug The roff language has a very subtle requirement. It enforces semantic line breaks. That is, any roff document must end a sentence with a line break. For example, this is incorrect: ~~~ This is a sentence. This is another sentence. ~~~ We have to insert a line break after each sentence: ~~~ This is a sentence. This is another sentence. ~~~ The semantic line breaks are used to add optional sentence spacing. It uses the line breaks to detect when a period (or question or exclamation mark) represents the end of a sentence and then adds an optional extra space when displaying it. The relevant documentation for groff(1) and mandoc(1) is at - https://www.gnu.org/software/groff/manual/groff.html#Sentences - https://mandoc.bsd.lv/man/roff.7.html#Sentence_Spacing ## Minimal example Here is a minimal example that shows how the reST man page writer has this bug: ~~~rst ### mwe ### a minimal example ################# :Date: October 13, 2021 :Manual section: 1 :Manual group: Testing Docutils :Version: mwe 0.1.0 Synopsis ======== | mwe [**-aq**] [**-b** *file*] [**\--long-long** *which*] *file \...* Description =========== To find the common attributes of a variety of objects, it is necessary to begin, by surveying the *objects* themselves in the concrete. Let us therefore advert successively to the various modes of action, and arrangements of human affairs, which are classed, by universal or widely spread opinion, as Just or as Unjust. The things well known to excite the sentiments associated with those names, are of a very multifarious character. I shall pass them rapidly in review, without studying any particular arrangement. The previous line will have been spaced with two spaces. Options ======= Its arguments are as follows: -a Do all. -q Be quiet. -b file Do everything to *file*. --long-meme which Chooses the long named *which*. Environment =========== mwe is not affected by environment variables. Exit status =========== mwe exits 0 on success. ~~~ If you convert this with `rst2man.py mwe.rst mwe.1` and view it with `man ./mwe.1` you will notice the issue easily: some sentences in the DESCRIPTION section end with 1 space and some sentences end with 2 spaces. ## Possible solutions The most complete solution would be to automatically detect where sentences end in the input and add line breaks as required in the man page output. This is what Pandoc does. However, it requires keeping a list of abbreviations for every language so as to not have false positives: you don't want to add a line break when someone uses "e.g.". Another solution would be to use the `.ss` macro to remove the extra space at the end of any sentences. Then Docutils wouldn't have to conform to roff syntax. However, this would not work with mandoc(1) as the developer of that program has decided not to support `.ss`. This is what Asciidoctor does. Please let me know if you have any questions. --- Sent from sourceforge.net because doc...@li... is subscribed to https://sourceforge.net/p/docutils/bugs/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/docutils/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |