During the last few weeks I've spent some time
carefully studying reStructuredText and Python-Markdown.
I would prefer to write with reStructuredText,
but Markdown has one feature that is lacking in reST:
easy-to-type line-breaks in paragraphs.
I'm addicted to semantic line breaking.
I think in medium-length phrases that can fit in one line.
Probably this was heightened by working with video subtitles,
but I've always felt that prose can be understood more quickly
if the line-breaks fall between semantic units.
Even when I can't make semantic units fit neatly,
I still like to control where the lines break.
An extension for Python-Markdown, nl2br,
completely disables markdown's removal of line-breaks,
so I can simply write the way I usually do.
With another markdown extension, you type a backslash
to indicate that you want to keep the following line-break.
I'm aware of the methods available in reST:
starting every line of a block with |
,
or typing something like |br|
at the end of a line.
They're not really suitable to use with every line of every paragraph.
Perhaps the removal of line-breaks from paragraphs is baked so deeply into reST
that it's not practical to change the current behaviour?
Anyway, my question/feature-request is: can you make
either or both of the following methods available to users?
Preferably, set a document-wide parameter
that causes actual line-breaks in paragraphs to be kept.
If that's not practical, create a hard line-break
by typing a backslash immediately before a line-break
(with or without a space before the backslash).
I'd like to convert my reST docs to html5, using <br>\n
for line breaks.
I don't understand why arbitrary line-breaks are the norm. In early email and code editors line length was limited, but I don't see why any form of markup or markdown expects people to insert them these days. When I don't want semantic breaks, I simply don't insert any breaks at all, and then a paragraph is a single unbroken line that gets wrapped at the margins. Why isn't this the norm?
Hope my point of view makes at least a little sense to you.
Not only in paragraphs, of course, but also in list-items and definitions and anywhere else where line-breaks are usually discarded.
Docutils preserves line-breaks and writes tham (as
\n
) into the output document.For HTML, a custom style-sheet with the CSS rule
p {white-space: break-spaces}
should suffice to get "semantic line breaks" as intended.(Tested with the example below + a bullet list.)
IMHO.
It is not arbitrary line breaks, other people call it paragraphs and sentences.
The non-justified paragraphs only took on because modern word processors are bad at aligning left and right.
Sorry I can not tell the depth this is ingrained in docutils.
But your request sounds unusual, we have people referencing the chicago manual of style ... now you have your own, which I consider cool, which I think is structured Text
for a start, so you can use your style, I would use awk as a preprocessor to insert double linebreaks
or add a line block character
cheers
Thanks for the awk tip.
I break lines on sentence breaks,
sometimes on phrase breaks within sentences,
and sometimes just to make line lengths close to equal.
I don't like to let line-breaking be semantically random,
as it is when allowed to wrap to whatever the page width is.
Have you seen https://bobheadxi.dev/semantic-line-breaks/ ?
It provides another reason to accept line breaks as intentional, and preserve them.
I'd forgotten about them. ;-)
As to why reST and Markdown remove line-breaks from paragraphs,
I guess they assume the breaks were inserted to conform to the editor's page width,
and remove them to let the paragraph soft-wrap to the width of any target.
I can't think of any other reason.
(off topic)
It's a guide for writing academic works for Chicago university.
In the USA, many people think it should be applied to all English writing,
but I believe that's a mistaken view, as it's not suitable for all other kinds of writing.
Last edit: jon 2022-01-02
I've just re-read it, and realized that he's talking about line breaks in the source.
He doesn't suggest line-breaking the rendered output in that way.
So it looks like I'm on my own with this ... :)
i think i use similar structure which results in hanging indents on longer sentences that would have no line break before the paper margin. mainly i make bullet lists without bullets.
i would suggest not using
but putting each line in a
On 2022-01-02, jon via Docutils-develop wrote in gmane.text.docutils.devel:
I agree with https://sembr.org/, that for the end-user (reader),
block paragraphs provide for a better reading experience.
I would not like to read a novel with "semantic line breaks".
Besides a custom writer to keep original line-breaks in HTML, you may also
consider "semantic empty lines" in your source:
Place an empty line after each sentence and a transition at the more
prominent breaks.
Use a custom style-sheet to remove the additional vertical space
between paragraphs and to style the "transition" element as
an empty block.
Then, this example would look similar to
two paragraphs with semantic line breaks after each sentence.
Last edit: Günter Milde 2023-12-12
Günter Milde wrote:
Neither would I. :)
Not everything I write has semantic breaks,
but in explanatory writing, I think they can help comprehension,
when combined with clear simple sentence structure.
A few lines I wrote yesterday provide a good example of how I use them,
as well as being a description of how I use them:
And of course, as you can see in my messages,
where a semantic break can be made at a reasonable line length,
I make it, rather than make a non-semantic break a few words later.
This is second nature to me now, not much extra thought is required.
I'm not too keen on the idea of "semantic empty lines",
as I hope to be able to write in the way that comes to me naturally,
without thinking about extra artifice to achieve the desired result.
Very short paragraphs are common in punchy journalistic writing,
but that style doesn't often suit my subject matter.
Thanks anyway. :)
I see your point.
OTOH, I don't think there is wide use. Docutils will not provide this feature "out of the box" but we are ready to help with a custom solution.
Note, that the approach with a custom writer is different from "hard line breaks" as offered by Markdown. It treats preservation of source line breaks as a style decision, so you can use the same source for a document tailored at end users with variable screen/page width and a document for end users preferring hand-crafted line breaks.
Actually, you don't need a custom writer, just a custom CSS style sheet:
The handling of line break characters by the browser can be configured with CSS.
The white-space property is your friend, e.g.
in a custom stylesheet will tell the browser to break lines at the same place as in the rST source.
This can be extended with adapted spacing rules and a rule to "normally" wrap elements with the "wrap" class argument.
Last edit: Günter Milde 2022-05-05
Hi Jon,
Docutils preserves line breaks throughout the processing pipeline. You can see this if you view the source of generated HTML documents. To turn them into rendered newlines in html, you can use a custom writer that replaces
\n
characters with<br>
elements, as the below sketch illustrates.A
Last edit: Adam Turner 2022-01-02
It adds
<br>\n
in literal blocks (that get enclosed in<pre>
tags), making extra blank lines where they're not wanted, as well as doing the right thing in paragraphs.Would it be too big a job to add this functionality to docutils, such that it works only where it's needed, and it could be turned on/off (for general use) by an option named 'smbr'?
For use on an individual para (or group of paras) maybe a directive named 'smbr'?
Not in a hurry, so please don't feel pressured. (Am building a little app, and the documentation will come later.) Sorry I can't do it myself, but life isn't quite long enough to learn everything ;)
Last edit: jon 2022-01-03
Do you have a minimal reproducer of it not working? I used your introductory post locally to write the above & it worked for me!
A
This rst
produces this html
and of course,
<br>\n
within the<pre>
blockcauses the final render to have double-spaced lines.
This markdown
when entered in ReText (which uses Python-Markdown)
with the nl2br extension active, produces this html
It puts
<br>\n
in the paragraph, but only\n
in the<pre>
block.Last edit: jon 2022-01-03
This is a long shot, as I've never looked at this code before.
In the attached file I've written comments at lines 722 and 738 (search for "# XXX").
I have no idea how to do what I've suggested there,
but it looks like it might be the right place to do it.
I was playing with rst2html5,
and found it easier to look around in its few files
rather than in the whole docutils installation.
I'm in Australia, so I won't be around for the next 10 hours or so.
Wow, that's great. Thanks Adam.
I had been playing with reST in ReText with live preview,
and didn't think to look in the HTML code.
Got the Python code working already.
Last edit: jon 2022-01-02
I would note that myst-parser (Markdown to docutils parser), uses markdown-it which captures soft and hard break tokens (hard-breaks are denoted by a backslash, see: https://spec.commonmark.org/0.17/#hard-line-breaks, and you can test it out in https://markdown-it.github.io).
At present, it turns hard breaks into "raw"
<br>
for HTML and\\n
for latex: https://github.com/executablebooks/MyST-Parser/blob/885651fc4e25dbac414c20cfe8fe3232ba4e833b/myst_parser/docutils_renderer.py#L403-L408, since I didn't see a better wayIt would be ideal though if there was a docutils node class for such hard breaks, with the actual translators handling this, or open to other solutions?
I agree with Chris here -- I've had to hack in manual line breaks via a custom node type a couple of times -- line blocks aren't suitable for example in headings where you want to break over a line.
\\
is of course the convention in TeX for a break at any point -- perhaps the same could be adopted for Docutils, instead of just a backslash at the end of the line?Not providing for "hard line break elements" in the Docutils document model and reStructuredText syntax was a deliberate decision at the time these two were devised.
The original idea of preserving all source line breaks in HTML display can be implemented with the current document model with a custom HTML writer.
However, times have changed and there are new arguments and use-cases for hard line breaks.
Something like a
<br>
inline node or special handling of<inline class=line-break>
inline nodes may be considered. The latter would not change the document model and could be implemented at the writer level or via stylesheet rules.Whether to add a reStructuredText syntax for hard line breaks can be decided independently. The current rules require a non-white character in a role defined with
.. role:: line-break
.I would like to hear David Goodger on this before any decision.
On Tue, 12 Dec 2023 20:12:18 -0000
"Günter Milde" via Docutils-develop
docutils-develop@lists.sourceforge.net wrote:
One, IMHO, "good reason" has to do with working with revision
control systems. They generally display changes on a line-by-line
basis. This means that a "best practice" is to begin each sentence
on a new line. That way when reviewing diffs prior to a commit
it is very clear just which sentences changed and which did not.
Another "good reason" is that terminal/window widths vary, which means
that when lines are long the eye can no longer "track back" and easily
find then next line. This slows reading speed. It is one reason why
newspapers are written in columns, which have a short line-length.
So a relatively short fixed-length maximum line length in source
documents aids comprehension. (RST is supposed to be readable
after all.) See human-factors/reading comprehension research.
Regards,
Karl kop@karlpinc.com
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein
Related
Feature Requests:
#85The original request is most easily handled by a custom style sheet (cf. comment.
The issue of a dedicated "hard" line break syntax and doctree element should be discussed separately.
See [#101] for the suggestion of a new "doctree" element for hard line breaks.
Related
Feature Requests: #101