SourceForge has been redesigned. Learn more.

#11 Handle nbsp-entity (\xA0) in Paragraph


A nbsp entity (code 160 = hex A0) should be handled like a space, except it does not allow a line break.

Hmm, does it make sense at all if the paragraph uses auto-hyphenation? There may be cases, where it actually seems sensible. For example, the space in the middle of "Dr. Who" probably isn't a good place for a line break.

There seem to be 2 possible places where to handles this:
a) in the Paragraph
b) in the Hyphenator

I tend to b), since then the Paragraph only has to handle nbsp like a letter, and only the BaseHyphenator has to handle it (probably the BaseHyphenator should handle space and nbsp in the input anyway).


  • H. von Bargen

    H. von Bargen - 2008-12-21

    Hmm, thinking about it:
    If there is a phrase "Dr. Müller-Lüdenscheid", how should it be handled by a line-breaking algorithm?
    There are 5 possible hyphenation points:
    1 Dr.^Müller-Lüdenscheid
    2 Dr. Mül-^ler-Lüdenscheid
    3 Dr. Müller-^Lüdenscheid
    4 Dr. Müller-Lü-^denscheid
    5 Dr. Müller-Lüden-^scheid
    Since   is used, obviously 1 is not allowed.
    But what about the others?
    And what if the word in front of the   is longer?

    Which of

  • H. von Bargen

    H. von Bargen - 2008-12-21

    Fixed in trunk. Basically, a nbsp is now treated like a letter. Adding a nbsp between two words AAA and BBB will now usually avoid a line-breaking, since AAA BBB is an unknown word for the hyphenator. Only the BasicHyphenator hyphenation rules apply: for "AAA BBB-CCC", a hyphenation point will be inserted only at the "-" character. Special care is taken to avoid adding hyphenation points after the dot in "Dr. Whoever" (see the change in

  • H. von Bargen

    H. von Bargen - 2008-12-21
    • status: open --> open-fixed
  • H. von Bargen

    H. von Bargen - 2008-12-21
    • status: open-fixed --> closed-fixed

Log in to post a comment.