#726 "Illegal" tag w/in a table produces broken/incorrect order


I have found an issue in which HTML Tidy is not only
incorrectly terminating a table but also results in an
incorrectly ordered document tree/output.

The attached sample contains a centered table with an
another <CENTER> tag appearing in the table between
table cell (illegal). This results in the table being
incorrectly terminated with the remaining cells of the
row and all subsequent rows inserted into a second table.

Furthermore, the insertion of this into the document
tree appears to be incorrectly handled resulting in the
text after the table being added before the table, then
the second table appears, and then the first table (ie

Visual observation of the input HTML with MSIE and
Firefox appear to show two different behaviors.

MSIE moves any content or non-table elements within a
table but outside of cells or captions to before the
table (as if it originally appeared before). As such,
the CENTER affects the text following the table.

FF seems to be more closer to the specifications and
appears to completely disregard any content non-table
elements that fall outside of cells or captions.

Either end result is acceptable but the current
termination and mis-ordering clearly is not.

NOTE: This may somehow be related to Issue #1316258
which I just reported as well.


  • Christopher M. Woods

    Logged In: YES

    Traced through code and it appears that the issue is in
    parser.c's ParseBlock():

    else if ( nodeHasCM(node, CM_TABLE) ||
    nodeHasCM(node, CM_ROW) )
    node = InferredTag(doc, TidyTag_TABLE);

    When the td tag is encountered while processing the exiled
    content (lexer's Exiled flag is set), this code is creating
    a new table, insert it into the document tree, and allowing
    processing to continue within the ParseBlock() function
    resulting in the mangled tree/output.

    It appears that the solo purpose of the 'exiled' flag in the
    lexer is to note when content has been moved out of the
    table. As such, it should be used here to determine which
    actions should take place:

    else if ( nodeHasCM(node, CM_TABLE|CM_ROW) )
    if (lexer->exiled)
    Encountered table related tag while
    exiled content - return so table
    processing can
    } else {
    Encountered table related tag (but
    not table)
    during normal processing. Insert
    the implied
    table element start tag.
    node = InferredTag(doc, TidyTag_TABLE);

    The modified code correctly returns execution to the
    ParseRow() functionality when the td tag is encountered in
    the exiled state but maintains existing functionality for
    all non-exiled conditions. This results in correctly
    clearing the lexer's exiled flag, ordering is properly
    maintained in the tree/output, and the table row isn't
    broken into two tables.

  • Arnaud Desitter

    Arnaud Desitter - 2005-11-08
    • status: open --> pending
  • Arnaud Desitter

    Arnaud Desitter - 2005-11-08
    • status: pending --> pending-fixed
  • Arnaud Desitter

    Arnaud Desitter - 2005-11-08

    Logged In: YES

    Fixed in CVS.

  • Christopher M. Woods

    Logged In: YES

    Closing issue. Thanks Arnaud!

  • Christopher M. Woods

    • status: pending-fixed --> closed-fixed
  • Arnaud Desitter

    Arnaud Desitter - 2006-01-10
    • priority: 5 --> 6
    • status: closed-fixed --> open-accepted
  • Arnaud Desitter

    Arnaud Desitter - 2006-01-10

    Logged In: YES

    The fix has to be disabled as it created an infinite loop
    (http://tidy.sf.net/issue/1316307). More work is necessary
    to make it work.

  • Christopher M. Woods

    Logged In: YES

    Hmm... I'll see what I can do to get the two to play nicely

  • Christopher M. Woods

    parser.c 1.162 + fix for issue #1398397

  • Christopher M. Woods

    Logged In: YES

    Okay here's my findings with regards to the patch for this
    issue and the infinite loop issue

    The patch for this issue is not the direct cause of the
    infiniate loop issue - rather it allows the infinite loop
    issue to manifest. This is an important distinction to make.

    The infinite loop issue is caused by the fact that the
    ParseList() function does not check for the lexer's exiled
    state when table tags are encountered like most other
    Parse*() function loops do.

    To fix the infinite loop issue:

    1) Re-enable the patch for this issue.
    2) Add the following snipet to the "not LI" block of

    /* special case </tr> etc. for stuff moved in front of
    table */
    else if ( lexer->exiled && (node->tag->model & CM_TABLE) )

    Here is the above using 'diff' - where "MyTidy" is my local
    modified source code and "TidyCVS" is the current snapshot
    from CVS (parser.c 1.162):

    diff .\MyTidy\src\parser.c .\TidyCVS\src\parser.c
    > #if 0
    > /* Diabled due to
    http://tidy.sf.net/issue/1398397 */
    > #endif
    < /* special case </tr> etc. for stuff moved in
    front of table */
    < else if ( lexer->exiled && (node->tag->model &
    CM_TABLE) )
    < {
    < return;
    < }

    I've also attached the entire modified version of parser.c
    1.162 with the above two changes.

    These changes were regression tested using alltest.cmd with
    no reported differences/failures. I also tested the file
    with 2 variants of the infinite looping test filing and both
    returned [correct] expected results.

    CC: #1398397

  • Arnaud Desitter

    Arnaud Desitter - 2006-02-10

    Logged In: YES


    While your patch will fix the known problem, it is likely
    that other inputs can trigger infinite loops.
    if ( lexer->exiled && (node->tag->model & CM_TABLE) )
    have probably to be added to most parsing routines. It was
    not needed before as the exiled mode was tested only for end
    Anyway, I think that your solution is incomplete. Somebody
    needs to sit down and work out the details.

  • Christopher M. Woods

    Logged In: YES

    I'd have to agree with you that the patch specifically
    addresses the #1398397 issue. That issue brings together
    two situations in the code:

    1) Movement of content from inside of a table to before the

    2) Detecting the point of return back to the original table.

    Disabling the check for the lexer's exiled state which the
    original fix added means that instead of returning to the
    previous table, the code will instead imply a new table and
    all of the table's content from that point forward will end
    up appearing in the output PRIOR to the start of the
    original table! IMHO, this is clearly not acceptable.

    I think it's better for the fix to remain in the active code
    and to address any issues that may arise in it's usage than
    to revert to the above logic flow.

  • Christopher M. Woods

    Logged In: YES

    Although this patch remains in the disabled state I have
    continued to use it with the additional ParseList changes
    previously outlined. I have come across a case that
    requires one additional tweak to the ParseList changes:

    The code needs to check "node->tag" before using
    else if ( lexer->exiled && node->tag && (node->tag->model &

    I did not dig into the case too much as I just don't have
    the time currently. I did however want to pass along this
    information in case anyone else runs into the issue.

  • Arnaud Desitter

    Arnaud Desitter - 2006-03-10

    Logged In: YES


    Please provide a test case (end do so before you loose it).
    Thank you.

  • Christopher M. Woods

    Logged In: YES

    Sorry Arnaud,

    I know that I'm not meeting up to the group's expectations
    for bug reports, test cases, and patch files. It's just
    been crazy for me on my end and I only have a couple minutes
    at a time. Unfortunately this will be another one of those

    I do have another "bug fix" for this issue. In the previous
    versions of the code fix, it appears that I was not checking
    all appropriate Content Models associated with the various
    table related tags. The code should read:

    else if ( lexer->exiled && node->tag &&
    nodeHasCM(node, CM_TABLE|CM_ROWGRP|CM_ROW) )

    Previously it was only checking against CM_TABLE and
    therefore TD/TH elements were not matching the check and
    therefore resulted in the infinite loop condition - which of
    course was why the fix was disabled in the first place.

    A test case that shows the error for this is:

    Cell Data

    With the change to include the additional content models in
    the test, the code now correctly moves the 'ul' to before
    the table and processes correctly.

  • Arnaud Desitter

    Arnaud Desitter - 2006-04-13
    • status: open-accepted --> pending-accepted
  • Arnaud Desitter

    Arnaud Desitter - 2006-04-13

    Logged In: YES

    Patch applied. Let's see if it survives in the open field.

  • SourceForge Robot

    • status: pending-accepted --> closed-accepted
  • SourceForge Robot

    Logged In: YES

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 30 days (the time period specified by
    the administrator of this Tracker).


Log in to post a comment.