#725 "Discarded" tag incorrectly terminates table row

closed-fixed
nobody
6
2005-11-21
2005-10-07
No

There appears to be a bug in parsing that is causing
the incorrect termination of a table row and then
results in several other errors in a cascade effect.

Using the attached sample HTML file, Tidy.exe
terminates the first table row after the first cell
when an un-allowed tag is encountered, however the
warning/eror information states that the tag was discarded:

line 11 column 1 - Warning: discarding unexpected </center>

The termination then results in the following
additional warnings/errors:

line 12 column 17 - Warning: missing <tr>
line 16 column 9 - Warning: discarding unexpected </tr>
line 12 column 17 - Warning: <tr> isn't allowed in <tr>
elements
line 23 column 9 - Warning: discarding unexpected </tr>

Looking at the output, not only was the first row
terminated but the all remaining cells of the first row
and additional rows were placed into a single row.

Using Tidy API, I was able to see the exact same thing
when walking the document tree immediately after
parsing/load so it is not a cleanup issue but rather
parsing.

Discussion

  • Christopher M. Woods

    Logged In: YES
    user_id=576763

    I've traced this issue into the code [parser.c Rev 1.148].
    It seems that in ParseRow(), the call to DescendantOf()
    [line 2118] returns true for the closing center element
    terminating the row. (This is technically correct since a
    center element was opened immediately before the table.)
    Execution then returns to ParseTableTag() which determines
    that the closing center element is unexpected and should be
    discarded [line 2534].

    There is code in the ParseRow() function immediately after
    the call to DescendantOf() which would have discarded the
    tag if it had reached that point. Looking back in CVS, I
    see that in rev 1.53.2.9 the ancestry check was moved from
    after the discard checks to before to correct another issue
    (http://tidy.sf.net/bug/647900) so it's not a simple
    re-order fix.

    I do not think there are any non-table related [block]
    elements that should terminate a row with the exception of
    the closing tags for html and [most likely] body elements.
    As such, it *might* be sufficient to change line 2118 from:

    if ( DescendantOf(row, TagId(node)) )

    to:

    if ( nodeHasCM(node, CM_HTML|CM_TABLE) &&
    DescendantOf(row, TagId(node)) )

    This would allow all table related ancestors to terminate
    the row along with html/body and utilize the discard checks
    for any other closing tag encountered.

     
  • Christopher M. Woods

    • summary: "Ignored" tag incorrectly terminates table row --> "Discarded" tag incorrectly terminates table row
     
  • Christopher M. Woods

    Logged In: YES
    user_id=576763

    Oops - my line numbers are off due to localized change I
    made to parser.c (for the nbsp in empty para issue). Sorry
    about that.

     
  • Christopher M. Woods

    Logged In: YES
    user_id=576763

    Correction to code fix. Apparently I missed the case where
    the row/cell was terminated with the </table> tag itself.
    The corrected fix should be:

    if ( (nodeHasCM(node, CM_HTML|CM_TABLE) ||
    nodeIsTABLE(node)) &&
    DescendantOf(row, TagId(node)) )

     
  • Arnaud Desitter

    Arnaud Desitter - 2005-11-09
    • priority: 5 --> 6
     
  • Arnaud Desitter

    Arnaud Desitter - 2005-11-18
    • status: open --> pending-fixed
     
  • Arnaud Desitter

    Arnaud Desitter - 2005-11-18

    Logged In: YES
    user_id=566665

    Fixed in CVS. Thanks for the patch.

     
  • Christopher M. Woods

    Logged In: YES
    user_id=576763

    Closing Issue.

     
  • Christopher M. Woods

    • status: pending-fixed --> closed-fixed
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks