#253 JTidy goes into infinite loop on specific input document

open
nobody
None
7
2012-10-08
2011-08-12
Anonymous
No

JTidy goes into infinite loop on specific input document:

http://www.takeovers.govt.nz/enforcement/decisions/2004/meeting-wrightson.php

When we call tidy.parse() the stack traces ends in many calls to Node.checkNodeIntegrity()
and the CPU is pegged at 100%

We're using the latest version of JTidy (r938). I've attached a copy of the input document
which triggers the behaviour.

Hopefully it's not too difficult to fix :)

Many thanks,

  • Francis.

Discussion

  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2011-08-12

    Copy of input which causes infinite loop

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2011-08-12

    And here's a more minimal document which exhibits the problem:

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <html>
    <head></head>
    <body>








    </body>

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2011-08-12

    Sorry - there was typo in the version I gave to Francis there... Make that...

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head></head>
    <body>








    </body>
    </html>

    (i.e. the extra opening html tag is not required, and the close html can be present)

    From a quick investigation the problem seems to be that parser is producing a cycle of br tags (with A followed by B and B followed by A) below the dd tag.

    e.g.

    [Node type=RootNode,element=null,content=
    [Node type=StartTag,element=html,content=
    [Node type=StartTag,element=head,content=
    [Node type=StartTag,element=title,content=null]
    ],
    [Node type=StartTag,element=body,content=
    [Node type=TextNode,element=null,text="",content=null]
    ,
    [Node type=StartTag,element=dl,content=
    [Node type=StartTag,element=dd,content=
    [Node type=StartTag,element=br,content=null]
    ,
    [Node type=StartTag,element=br,content=null],
    [Node type=StartTag,element=br,content=null],
    [Node type=StartTag,element=br,content=null],
    [Node type=StartTag,element=br,content=null],
    [Node type=StartTag,element=br,content=null],
    [Node type=StartTag,element=br,content=null],
    [Node type=StartTag,element=br,content=null],
    ...

    Though not a proper fix, this patch will detect the cycle and throw a RuntimeException (and will also limit the loop in toString to help see what's happening as above).

    Index: src/main/java/org/w3c/tidy/Node.java

    --- src/main/java/org/w3c/tidy/Node.java (revision 1261)
    +++ src/main/java/org/w3c/tidy/Node.java (working copy)
    @@ -1311,7 +1311,11 @@

        for (child = this.content; child != null; child = child.next)
        {
    
    • if (child.parent != this || !child.checkNodeIntegrity())
    • if (this.next != null && this.next.next == this) {
    • throw new RuntimeException("Cycle detected - aborting");
    • }
      +
    • if (child.parent != this || !child.checkNodeIntegrity())
      {
      return false;
      }
      @@ -1347,8 +1351,15 @@
      String s = "";
      Node n = this;

    • int loopLimit = 1024;
      while (n != null)
      {

    • if (loopLimit < 0) {
    • s += "...TRUNCATED...";
    • n = null;
    • break;
    • }
    • loopLimit--;
      s += "[Node type=";
      s += NODETYPE_STRING[n.type]
      ;
      s += ",element=";
     
    Last edit: Anonymous 2014-01-22

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks