From: SourceForge.net <no...@so...> - 2011-08-12 07:07:00
|
Bugs item #3390317, was opened at 2011-08-12 00:43 Message generated for change (Comment added) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3390317&group_id=13153 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: Francis Crimmins () Assigned to: Nobody/Anonymous (nobody) Summary: JTidy goes into infinite loop on specific input document Initial Comment: JTidy goes into infinite loop on specific input document: http://www.takeovers.govt.nz/enforcement/decisions/2004/meeting-wrightson.php When we call tidy.parse() the stack traces ends in many calls to Node.checkNodeIntegrity() and the CPU is pegged at 100% We're using the latest version of JTidy (r938). I've attached a copy of the input document which triggers the behaviour. Hopefully it's not too difficult to fix :) Many thanks, - Francis. ---------------------------------------------------------------------- Comment By: https://www.google.com/accounts () Date: 2011-08-12 07:07 Message: Sorry - there was typo in the version I gave to Francis there... Make that... <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head></head> <body> <em> <dl> <p> <dd> </dd> </p> </dl> </em> </body> </html> (i.e. the extra opening html tag is not required, and the close html can be present) From a quick investigation the problem seems to be that parser is producing a cycle of br tags (with A followed by B and B followed by A) below the dd tag. e.g. [Node type=RootNode,element=null,content= [Node type=StartTag,element=html,content= [Node type=StartTag,element=head,content= [Node type=StartTag,element=title,content=null]], [Node type=StartTag,element=body,content= [Node type=TextNode,element=null,text="",content=null], [Node type=StartTag,element=dl,content= [Node type=StartTag,element=dd,content= [Node type=StartTag,element=br,content=null], [Node type=StartTag,element=br,content=null], [Node type=StartTag,element=br,content=null], [Node type=StartTag,element=br,content=null], [Node type=StartTag,element=br,content=null], [Node type=StartTag,element=br,content=null], [Node type=StartTag,element=br,content=null], [Node type=StartTag,element=br,content=null], ... Though not a proper fix, this patch will detect the cycle and throw a RuntimeException (and will also limit the loop in toString to help see what's happening as above). Index: src/main/java/org/w3c/tidy/Node.java =================================================================== --- src/main/java/org/w3c/tidy/Node.java (revision 1261) +++ src/main/java/org/w3c/tidy/Node.java (working copy) @@ -1311,7 +1311,11 @@ for (child = this.content; child != null; child = child.next) { - if (child.parent != this || !child.checkNodeIntegrity()) + if (this.next != null && this.next.next == this) { + throw new RuntimeException("Cycle detected - aborting"); + } + + if (child.parent != this || !child.checkNodeIntegrity()) { return false; } @@ -1347,8 +1351,15 @@ String s = ""; Node n = this; + int loopLimit = 1024; while (n != null) { + if (loopLimit < 0) { + s += "...TRUNCATED..."; + n = null; + break; + } + loopLimit--; s += "[Node type="; s += NODETYPE_STRING[n.type]; s += ",element="; ---------------------------------------------------------------------- Comment By: Francis Crimmins () Date: 2011-08-12 06:07 Message: And here's a more minimal document which exhibits the problem: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <html> <head></head> <body> <em> <dl> <p> <dd> </dd> </p> </dl> </em> </body> ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=113153&aid=3390317&group_id=13153 |