Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.


#360 Preserve whitespace (especially linebreaks) in text nodes.


My new lines was eaten!

As there is option "literal-attributes" which preserves whitespace in attribute values, there should also be an option to preserve whitespace in all nodes.

This works for pre elements – there stays line breaks unchanged. But in other elements (like p, div…) the original whitespace is dropped and text is re-wrapped.

I have an XSL template which converts text nodes inside html/body (directly here, not in p elements) to paragraphs (p). Parts of this text node which are divided by empty row are converted into several paragraphs (p). It works good, but sometimes I get an ugly not well-formed document which can not be transformed. So I repair it with tidy and then want to transform using my XSL template – but now it does not work, because new lines was eaten and I can't separate original paragraphs.
Option --wrap 0 does not work (lines are long, but necessary line ends are still missing).

I have found a workaround – add special sign (some unicode character, not used in document) before every line end. Then do tidying. And after it replace special signs back to line ends:

input = input.replaceAll("\\n", "◆\n");
// do tidying
tidyTexy = tidyTexy.replaceAll("◆\\n", "\n");
tidyTexy = tidyTexy.replaceAll("◆", "\n");

Then it does what I want. But it is dirty hack. I would very appreciate option to preserve original whitespace in text nodes.