Menu

#126 Added TBODY elements don't get the XHTML namespace

open
nobody
None
5
2017-10-31
2011-01-04
No

If I pass the example input through the following code (using version 1.9.14):

HTMLConfiguration config = new HTMLConfiguration();
// See http://nekohtml.sourceforge.net/settings.html
config.setFeature("http://cyberneko.org/html/features/insert-namespaces", true);
config.setFeature("http://cyberneko.org/html/features/balance-tags/ignore-outside-content", true);
config.setProperty("http://cyberneko.org/html/properties/names/elems", "lower");
DOMParser parser = new DOMParser(config);
parser.parse(new InputSource(new StringReader(html)));
parser.getDocument().normalizeDocument();

An TBODY element is added, but it doesn't get a namespce (xmlns=""). I would expect that it gets the default xhtml namespace though.

Discussion

  • Eric Wout van der Steen

    Example input and output

     
  • qqilihq

    qqilihq - 2012-03-08

    This seems to be a general problem with inserted elements. Taking the attached web page as an example, a tr element is inserted by the parser, but it doesn't get the namespace. This makes all post processing steps which we perform very difficult, as we rely on XPath queries.

    Any suggestions on how to fix this issue are appreciated.

     
  • qqilihq

    qqilihq - 2012-03-08

    Sorry, seems that I cannot add attachments. The mentioned web page is:
    http://slotmachinebasics.com/

    //xhtml:div[1]/xhtml:table[3]/xhtml:tr[1]/xhtml:td[2]/xhtml:blockquote[2]

    The tr element not in XHTML namespace, so our XPath fails, although Neko is configured to insert XHTML namespaces.

     
  • qqilihq

    qqilihq - 2012-03-09

    With the following fix, the problem seems to be solved. As this was a quick patch, I'm not sure whether this is the correct solution, so I would be glad about some (developer) feedback.

    *** HTMLTagBalancer.old 2012-03-09 15:59:03.000000000 +0100
    --- HTMLTagBalancer.java 2012-03-09 15:51:29.000000000 +0100
    ***************
    *** 637,643 ****
    int depth = getParentDepth(element.parent, element.bounds);
    if (depth == -1) { // no parent found
    final String pname = modifyName(preferedParent.name, fNamesElems);
    ! final QName qname = new QName(null, pname, pname, null);
    if (fReportErrors) {
    String ename = elem.rawname;
    fErrorReporter.reportWarning("HTML2004", new Object[]{ename,pname});
    --- 637,643 ----
    int depth = getParentDepth(element.parent, element.bounds);
    if (depth == -1) { // no parent found
    final String pname = modifyName(preferedParent.name, fNamesElems);
    ! final QName qname = createQName(pname);
    if (fReportErrors) {
    String ename = elem.rawname;
    fErrorReporter.reportWarning("HTML2004", new Object[]{ename,pname});

     
  • Charles Yates

    Charles Yates - 2014-01-10

    I came up with the same solution as qqilihq independently. In the meantime our solution has been to use version 1.9.11 which doesn't have this problem. Patch with fix and unit test attached.

     
  • RBRi

    RBRi - 2017-10-31

    Have applied this fixes to the HtmlUnit fork.

     

    Last edit: RBRi 2017-10-31

Log in to post a comment.