From: SourceForge.net <no...@so...> - 2011-11-02 16:51:25
|
Bugs item #3432258, was opened at 2011-11-02 12:56 Message generated for change (Comment added) made by helsom You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3432258&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tidy functionality Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Hel (helsom) Assigned to: Nobody/Anonymous (nobody) Summary: Unwrapped inline content means invalid XHTML is generated Initial Comment: Using jtidy.parseDOM with setXHTML(true) and setEncloseBlockText(true) does not cause inline content to be properly wrapped and hence W3c validation fails. Example HTML 1 (generates valid XHTML) "Text <em>Inline content</em>" -> "<p>Text <em>Inline content</em></p>" Example HTML 2 (generates invalid XHTML) "<em>Inline content</em>" -> "<em>Inline content</em>" There is code within src/main/java/org/w3c/tidy/ParserImpl.java that performs this wrapping but it has been commented out due to bug report 1403105 : java.lang.StackOverflowError in Tidy.parseDOM(). Uncommenting this block of code seems to produce correctly wrapped XHTML in most situations, but unfortunately the stack over flow error still happens if the HTML mentioned in report 1403105 is supplied. Anyway that this can be reinstated without causing the stack over flow? ---------------------------------------------------------------------- >Comment By: Hel (helsom) Date: 2011-11-02 16:51 Message: Adding code very similar to the TEXT_NODE encloseBodyText processing (about line 799 within ParserImpl.java) for an inline element (at about line 934) seems to result in inline content within the body being properly wrapped, though it hasn't had extensive testing and there may be a better way. That is, if (node.type == Node.START_TAG || node.type == Node.START_END_TAG) { if ( (node.tag.model & Dict.CM_INLINE) != 0 ) { if (lexer.configuration.encloseBodyText) { Node para; lexer.ungetToken(); para = lexer.inferredTag("p"); body.insertNodeAtEnd(para); parseTag(lexer, para, mode); mode = Lexer.MIXED_CONTENT; continue; } } ... ---------------------------------------------------------------------- Comment By: Hel (helsom) Date: 2011-11-02 15:33 Message: Even with the mentioned code being re-instated, this only resolves wrapping of inline content within a blockquote, for example, and not at the top level within the body element. For Example: "<em>ssss</em> <blockquote><em>Inline content</em></blockquote>" generates xhtml: "<em>ssss</em> <blockquote> <p><em>Inline content</em></p> </blockquote>" Note the initial <em> does not get wrapped with a p element. If I place some text in front of it, however, it does get wrapped. For Example: "xxxx <em>ssss</em> <blockquote><em>Inline content</em></blockquote>" generates xhtml: "<p>xxxx <em>ssss</em></p> <blockquote> <p><em>Inline content</em></p> </blockquote>" ---------------------------------------------------------------------- Comment By: Hel (helsom) Date: 2011-11-02 14:08 Message: Update: This is not an xhtml-specific problem. Incorrectly wrapped content also fails HTML 4.01 Strict validation. It seems a shame to lose this important functionality because of what seems to be quite an obsure bug (1403105). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3432258&group_id=13153 |