Menu

#85 Enable easily-typed hard line-breaks in paragraphs.

Default
closed-works-for-me
nobody
linebreaks (1)
5
2023-12-12
2022-01-01
jon
No

During the last few weeks I've spent some time
carefully studying reStructuredText and Python-Markdown.

I would prefer to write with reStructuredText,
but Markdown has one feature that is lacking in reST:
easy-to-type line-breaks in paragraphs.

I'm addicted to semantic line breaking.
I think in medium-length phrases that can fit in one line.
Probably this was heightened by working with video subtitles,
but I've always felt that prose can be understood more quickly
if the line-breaks fall between semantic units.

Even when I can't make semantic units fit neatly,
I still like to control where the lines break.

An extension for Python-Markdown, nl2br,
completely disables markdown's removal of line-breaks,
so I can simply write the way I usually do.

With another markdown extension, you type a backslash
to indicate that you want to keep the following line-break.

I'm aware of the methods available in reST:
starting every line of a block with |,
or typing something like |br| at the end of a line.
They're not really suitable to use with every line of every paragraph.

Perhaps the removal of line-breaks from paragraphs is baked so deeply into reST
that it's not practical to change the current behaviour?

Anyway, my question/feature-request is: can you make
either or both of the following methods available to users?

  1. Preferably, set a document-wide parameter
    that causes actual line-breaks in paragraphs to be kept.

  2. If that's not practical, create a hard line-break
    by typing a backslash immediately before a line-break
    (with or without a space before the backslash).

I'd like to convert my reST docs to html5, using <br>\n for line breaks.

I don't understand why arbitrary line-breaks are the norm. In early email and code editors line length was limited, but I don't see why any form of markup or markdown expects people to insert them these days. When I don't want semantic breaks, I simply don't insert any breaks at all, and then a paragraph is a single unbroken line that gets wrapped at the margins. Why isn't this the norm?

Hope my point of view makes at least a little sense to you.

Related

Feature Requests: #85

Discussion

  • jon

    jon - 2022-01-01

    Not only in paragraphs, of course, but also in list-items and definitions and anywhere else where line-breaks are usually discarded.

     
    • Günter Milde

      Günter Milde - 2023-12-12

      Docutils preserves line-breaks and writes tham (as \n) into the output document.
      For HTML, a custom style-sheet with the CSS rule p {white-space: break-spaces} should suffice to get "semantic line breaks" as intended.
      (Tested with the example below + a bullet list.)

       
  • engelbert gruber

    IMHO.
    It is not arbitrary line breaks, other people call it paragraphs and sentences.
    The non-justified paragraphs only took on because modern word processors are bad at aligning left and right.
    Sorry I can not tell the depth this is ingrained in docutils.
    But your request sounds unusual, we have people referencing the chicago manual of style ... now you have your own, which I consider cool, which I think is structured Text

    for a start, so you can use your style, I would use awk as a preprocessor to insert double linebreaks
    or add a line block character

    cat 0001-Use-flit-and-pyproject.toml.patch | awk '{print $0 "\n"}' | rst2man.py > out.man
    

    cheers

     
    • jon

      jon - 2022-01-02

      Thanks for the awk tip.

      paragraphs and sentences

      I break lines on sentence breaks,
      sometimes on phrase breaks within sentences,
      and sometimes just to make line lengths close to equal.

      I don't like to let line-breaking be semantically random,
      as it is when allowed to wrap to whatever the page width is.

      Have you seen https://bobheadxi.dev/semantic-line-breaks/ ?
      It provides another reason to accept line breaks as intentional, and preserve them.

      justified paragraphs

      I'd forgotten about them. ;-)
      As to why reST and Markdown remove line-breaks from paragraphs,
      I guess they assume the breaks were inserted to conform to the editor's page width,
      and remove them to let the paragraph soft-wrap to the width of any target.
      I can't think of any other reason.

      chicago manual of style

      (off topic)
      It's a guide for writing academic works for Chicago university.
      In the USA, many people think it should be applied to all English writing,
      but I believe that's a mistaken view, as it's not suitable for all other kinds of writing.

       

      Last edit: jon 2022-01-02
      • jon

        jon - 2022-01-02

        Have you seen https://bobheadxi.dev/semantic-line-breaks/ ?
        It provides another reason to accept line breaks as intentional, and preserve them.

        I've just re-read it, and realized that he's talking about line breaks in the source.
        He doesn't suggest line-breaking the rendered output in that way.
        So it looks like I'm on my own with this ... :)

         
        • engelbert gruber

          i think i use similar structure which results in hanging indents on longer sentences that would have no line break before the paper margin. mainly i make bullet lists without bullets.
          i would suggest not using
          but putting each line in a

          ...

           
        • Günter Milde

          Günter Milde - 2022-01-02

          On 2022-01-02, jon via Docutils-develop wrote in gmane.text.docutils.devel:

          [-- Type: text/plain, Encoding: 7bit --]

          I've just re-read it, and realized that he's talking about line breaks
          in the source.
          He doesn't suggest line-breaking the rendered output in that way.

          I agree with https://sembr.org/, that for the end-user (reader),
          block paragraphs provide for a better reading experience.

          I would not like to read a novel with "semantic line breaks".


          Besides a custom writer to keep original line-breaks in HTML, you may also
          consider "semantic empty lines" in your source:

          Place an empty line after each sentence and a transition at the more
          prominent breaks.

          Use a custom style-sheet to remove the additional vertical space
          between paragraphs and to style the "transition" element as
          an empty block.

          Then, this example would look similar to
          two paragraphs with semantic line breaks after each sentence.

           

          Last edit: Günter Milde 2023-12-12
          • jon

            jon - 2022-01-02

            Günter Milde wrote:

            I would not like to read a novel with "semantic line breaks".

            Neither would I. :)

            Not everything I write has semantic breaks,
            but in explanatory writing, I think they can help comprehension,
            when combined with clear simple sentence structure.

            A few lines I wrote yesterday provide a good example of how I use them,
            as well as being a description of how I use them:

            I break lines on sentence breaks,
            sometimes on phrase breaks within sentences,
            and sometimes just to make line lengths close to equal.

            I don't like to let line-breaking be semantically random,
            as it is when allowed to wrap to whatever the page width is.

            And of course, as you can see in my messages,
            where a semantic break can be made at a reasonable line length,
            I make it, rather than make a non-semantic break a few words later.
            This is second nature to me now, not much extra thought is required.

            I'm not too keen on the idea of "semantic empty lines",
            as I hope to be able to write in the way that comes to me naturally,
            without thinking about extra artifice to achieve the desired result.

            Very short paragraphs are common in punchy journalistic writing,
            but that style doesn't often suit my subject matter.

            Thanks anyway. :)

             
            • Günter Milde

              Günter Milde - 2022-01-04

              I see your point.
              OTOH, I don't think there is wide use. Docutils will not provide this feature "out of the box" but we are ready to help with a custom solution.

              Note, that the approach with a custom writer is different from "hard line breaks" as offered by Markdown. It treats preservation of source line breaks as a style decision, so you can use the same source for a document tailored at end users with variable screen/page width and a document for end users preferring hand-crafted line breaks.

              Actually, you don't need a custom writer, just a custom CSS style sheet:

              • Docutils keeps line break characters when converting rST to HTML.
              • The handling of line break characters by the browser can be configured with CSS.
                The white-space property is your friend, e.g.

                p { white-space: pre-wrap; }
                

                in a custom stylesheet will tell the browser to break lines at the same place as in the rST source.

              This can be extended with adapted spacing rules and a rule to "normally" wrap elements with the "wrap" class argument.

               

              Last edit: Günter Milde 2022-05-05
  • Adam  Turner

    Adam Turner - 2022-01-02

    Hi Jon,

    Docutils preserves line breaks throughout the processing pipeline. You can see this if you view the source of generated HTML documents. To turn them into rendered newlines in html, you can use a custom writer that replaces \n characters with <br> elements, as the below sketch illustrates.

    from pathlib import Path
    
    from docutils.core import publish_string
    from docutils.writers import html5_polyglot
    
    
    class Writer(html5_polyglot.Writer):
        def __init__(self):
            super().__init__()
            self.translator_class = LineBreakTranslator
    
    
    class LineBreakTranslator(html5_polyglot.HTMLTranslator):
        def visit_Text(self, node):
            self.body.append("<br>\n".join(self.encode(node.astext()).splitlines()))
    
    
    def publish_preserving_line_breaks(path: Path):
        out_text = publish_string(
            path.read_text(encoding="utf-8"),
            writer=Writer(),
            settings_overrides={"output_encoding": "unicode"})
        path.with_suffix(".html").write_text(out_text, encoding="utf-8")
    
    
    if __name__ == '__main__':
        publish_preserving_line_breaks(Path("line-breaks.rst"))
    

    A

     

    Last edit: Adam Turner 2022-01-02
    • jon

      jon - 2022-01-03

      self.body.append("<br>\n".join(...)

      It adds <br>\n in literal blocks (that get enclosed in <pre> tags), making extra blank lines where they're not wanted, as well as doing the right thing in paragraphs.

      Would it be too big a job to add this functionality to docutils, such that it works only where it's needed, and it could be turned on/off (for general use) by an option named 'smbr'?
      For use on an individual para (or group of paras) maybe a directive named 'smbr'?

      Not in a hurry, so please don't feel pressured. (Am building a little app, and the documentation will come later.) Sorry I can't do it myself, but life isn't quite long enough to learn everything ;)

       

      Last edit: jon 2022-01-03
      • Adam  Turner

        Adam Turner - 2022-01-03

        Do you have a minimal reproducer of it not working? I used your introductory post locally to write the above & it worked for me!

        A

         
        • jon

          jon - 2022-01-03

          This rst

          this is a plain paragraph
          with four consecutive lines
          separated by line breaks
          without any blank line in between.
          
          ::
          
              this is a literal block
              with four consecutive lines
              separated by line breaks
              without any blank line in between.
          

          produces this html

          <p>this is a plain paragraph<br>
          with four consecutive lines<br>
          separated by line breaks<br>
          without any blank line in between.</p>
          <pre class="literal-block">this is a literal block<br>
          with four consecutive lines<br>
          separated by line breaks<br>
          without any blank line in between.</pre>
          

          and of course, <br>\n within the <pre> block
          causes the final render to have double-spaced lines.

           
          • jon

            jon - 2022-01-03

            This markdown

            this is a plain paragraph
            with four consecutive lines
            separated by line breaks
            without any blank line in between.
            
            ```
            this is a fenced code block
            with four consecutive lines
            separated by line breaks
            without any blank line in between.
            ```
            

            when entered in ReText (which uses Python-Markdown)
            with the nl2br extension active, produces this html

            <p>this is a plain paragraph<br>
            with four consecutive lines<br>
            separated by line breaks<br>
            without any blank line in between.</p>
            <pre><code>this is a fenced code block
            with four consecutive lines
            separated by line breaks
            without any blank line in between.
            </code></pre>
            

            It puts <br>\n in the paragraph, but only \n in the<pre> block.

             

            Last edit: jon 2022-01-03
      • jon

        jon - 2022-01-03

        This is a long shot, as I've never looked at this code before.
        In the attached file I've written comments at lines 722 and 738 (search for "# XXX").
        I have no idea how to do what I've suggested there,
        but it looks like it might be the right place to do it.

        I was playing with rst2html5,
        and found it easier to look around in its few files
        rather than in the whole docutils installation.

        I'm in Australia, so I won't be around for the next 10 hours or so.

         
  • jon

    jon - 2022-01-02

    Wow, that's great. Thanks Adam.

    I had been playing with reST in ReText with live preview,
    and didn't think to look in the HTML code.

    Got the Python code working already.

     

    Last edit: jon 2022-01-02
  • Chris Sewell

    Chris Sewell - 2022-01-02

    I would note that myst-parser (Markdown to docutils parser), uses markdown-it which captures soft and hard break tokens (hard-breaks are denoted by a backslash, see: https://spec.commonmark.org/0.17/#hard-line-breaks, and you can test it out in https://markdown-it.github.io).
    At present, it turns hard breaks into "raw" <br> for HTML and \\n for latex: https://github.com/executablebooks/MyST-Parser/blob/885651fc4e25dbac414c20cfe8fe3232ba4e833b/myst_parser/docutils_renderer.py#L403-L408, since I didn't see a better way
    It would be ideal though if there was a docutils node class for such hard breaks, with the actual translators handling this, or open to other solutions?

     
    • Adam  Turner

      Adam Turner - 2022-01-03

      It would be ideal though if there was a docutils node class for such hard breaks, with the actual translators handling this, or open to other solutions?

      I agree with Chris here -- I've had to hack in manual line breaks via a custom node type a couple of times -- line blocks aren't suitable for example in headings where you want to break over a line.

      \\ is of course the convention in TeX for a break at any point -- perhaps the same could be adopted for Docutils, instead of just a backslash at the end of the line?

       
    • Günter Milde

      Günter Milde - 2022-01-04

      Not providing for "hard line break elements" in the Docutils document model and reStructuredText syntax was a deliberate decision at the time these two were devised.
      The original idea of preserving all source line breaks in HTML display can be implemented with the current document model with a custom HTML writer.

      However, times have changed and there are new arguments and use-cases for hard line breaks.
      Something like a <br> inline node or special handling of <inline class=line-break> inline nodes may be considered. The latter would not change the document model and could be implemented at the writer level or via stylesheet rules.

      Whether to add a reStructuredText syntax for hard line breaks can be decided independently. The current rules require a non-white character in a role defined with
      .. role:: line-break.

      I would like to hear David Goodger on this before any decision.

       
  • Günter Milde

    Günter Milde - 2023-12-12
    • status: open --> closed-works-for-me
     
    • Karl O. Pinc

      Karl O. Pinc - 2023-12-12

      On Tue, 12 Dec 2023 20:12:18 -0000
      "Günter Milde" via Docutils-develop
      docutils-develop@lists.sourceforge.net wrote:

      [feature-requests:#85] Enable easily-typed hard line-breaks in
      paragraphs.

      Created: Sat Jan 01, 2022 08:21 AM UTC by jon
      Last Updated: Tue Dec 12, 2023 08:02 PM UTC

      I don't understand why arbitrary line-breaks are the norm. In early
      email and code editors line length was limited, but I don't see why
      any form of markup or markdown expects people to insert them these
      days. When I don't want semantic breaks, I simply don't insert any
      breaks at all, and then a paragraph is a single unbroken line that
      gets wrapped at the margins. Why isn't this the norm?

      One, IMHO, "good reason" has to do with working with revision
      control systems. They generally display changes on a line-by-line
      basis. This means that a "best practice" is to begin each sentence
      on a new line. That way when reviewing diffs prior to a commit
      it is very clear just which sentences changed and which did not.

      Another "good reason" is that terminal/window widths vary, which means
      that when lines are long the eye can no longer "track back" and easily
      find then next line. This slows reading speed. It is one reason why
      newspapers are written in columns, which have a short line-length.
      So a relatively short fixed-length maximum line length in source
      documents aids comprehension. (RST is supposed to be readable
      after all.) See human-factors/reading comprehension research.

      Regards,

      Karl kop@karlpinc.com
      Free Software: "You don't pay back, you pay forward."
      -- Robert A. Heinlein

       

      Related

      Feature Requests: #85

  • Günter Milde

    Günter Milde - 2023-12-12

    The original request is most easily handled by a custom style sheet (cf. comment.
    The issue of a dedicated "hard" line break syntax and doctree element should be discussed separately.

     
  • Günter Milde

    Günter Milde - 2023-12-12

    See [#101] for the suggestion of a new "doctree" element for hard line breaks.

     

    Related

    Feature Requests: #101


Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.