Using jtidy.parseDOM with setXHTML(true) and setEncloseBlockText(true) does not cause inline content to be properly wrapped and hence W3c validation fails.
Example HTML 1 (generates valid XHTML)
"Text Inline content" -> "
Text Inline content
"Example HTML 2 (generates invalid XHTML)
"Inline content" -> "Inline content"
There is code within src/main/java/org/w3c/tidy/ParserImpl.java that performs this wrapping but it has been commented out due to bug report 1403105 : java.lang.StackOverflowError in Tidy.parseDOM(). Uncommenting this block of code seems to produce correctly wrapped XHTML in most situations, but unfortunately the stack over flow error still happens if the HTML mentioned in report 1403105 is supplied. Anyway that this can be reinstated without causing the stack over flow?
Update: This is not an xhtml-specific problem. Incorrectly wrapped content also fails HTML 4.01 Strict validation. It seems a shame to lose this important functionality because of what seems to be quite an obsure bug (1403105).
Even with the mentioned code being re-instated, this only resolves wrapping of inline content within a blockquote, for example, and not at the top level within the body element.
For Example:
"ssss
"generates xhtml:
"ssss "
Note the initial does not get wrapped with a p element. If I place some text in front of it, however, it does get wrapped.
For Example:
"xxxx ssss
"generates xhtml:
"
xxxx ssss
"Adding code very similar to the TEXT_NODE encloseBodyText processing (about line 799 within ParserImpl.java) for an inline element (at about line 934) seems to result in inline content within the body being properly wrapped, though it hasn't had extensive testing and there may be a better way.
That is,