#608 NodeWrapper improvement

v8.8
closed
nobody
5
2012-10-08
2006-10-05
John Baker
No

Hello,

NodeWrapper:

public boolean hasChildNodes() {
    // In Xerces, an attribute node has child text

nodes
if (node.getNodeType() == Node.ATTRIBUTE_NODE) {
return false;
}
return node.hasChildNodes();
}

The node.hasChildNodes() call needs to be cached,
because it's a little slow. I guess the best way to do
this is to cache the entire method result:

private Boolean hasChildNodes;

public boolean hasChildNodes() {
    if (hasChildNodes == null) {
      if (node.getNodeType() == Node.ATTRIBUTE_NODE)
        hasChildNodes = new Boolean(false):
      else 
        hasChildNodes = 
          new Boolean(node.hasChildNodes());
    }
    return hasChildNodes.booleanValue();
}

John

Discussion

  • Michael Kay

    Michael Kay - 2006-10-06

    Logged In: YES
    user_id=251681

    John, I ask people (in bright yellow on the bug submission
    page) not to enter bugs directly into the bug register until
    they are confirmed as bugs: please raise issues first on the
    help list or forum, or in "support requests". The reason is
    to make it easier for people who are searching for known
    bugs, if these are clearly distinguished from all the false
    alarms.

    In your case you've raised a number of points which I would
    classify as suggestions for improvement, rather than bugs.
    Theya re useful comments and I will act on them, but I will
    probably delete them from the bug register.

     
  • Michael Kay

    Michael Kay - 2006-10-06

    Logged In: YES
    user_id=251681

    I'm reluctant to add any bytes to the size of the node
    wrapper, since memory usage can often be a problem when
    wrapping a large DOM. Before making such a change, I'd like
    to see evidence of performance benefit. Performance
    questions with the DOM are always difficult because
    different DOM implementations have different performance
    characteristics.

    It might be possible to avoid a size increase by overloading
    the "span" field which is currently used only for text
    nodes, but again, I would want to see evidence that there is
    a problem here that needs solving.

     
  • John Baker

    John Baker - 2006-10-06

    Logged In: YES
    user_id=1613906

    The evidence is that the DOM hasChildNodes method is slow -
    just like NodeList.getLength() - in fact, they probably
    share the code given they provide roughly the same information.

    I don't honestly think a couple bytes is going to hammer te
    memory usage - in any large app, there are going be many
    other areas where far more memory is wasted than a Boolean
    in the NodeWrapper. Indeed, you could reduce memory
    consumption by implementing the other mods I've suggested
    such as not creating those (slow) IntHashSets until they're
    required.

     
  • Michael Kay

    Michael Kay - 2006-10-06

    Logged In: YES
    user_id=251681

    Indeed, you could reduce memory consumption by implementing
    the other mods I've suggested such as not creating those
    (slow) IntHashSets until they're required.

    Sorry, but you suggested that change in two places: one in
    the Executable, which has a single instance in most
    applications, and one in NamespaceIterator, where the
    objects are highly transient. Neither would have any
    noticeable effect on total heap size. An extra object
    reference in NodeWrapper most certainly does affect memory
    usage, given that the number of NodeWrappers can sometimes
    be the same as the number of Nodes, and in the worst case
    can even be larger. Also, NodeWrappers are allocated very
    frequently, so adding an object allocation would be a high
    price to pay (you could avoid that easily, however, by using
    a byte to represent the three possible states rather than a
    Boolean).

    I'm happy to make the change if there's evidence to justify it.

     
  • Michael Kay

    Michael Kay - 2006-10-06

    Logged In: YES
    user_id=251681

    I wonder if the high cost you are seeing for getChildNodes()
    is because there is some lazy evaluation going on, and
    getChildNodes() is actually constructing the list of
    children? This might also account for NodeList.getLength().
    If that's the case, one would expect the first call to be
    expensive and subsequent calls to be cheap.

    Could you find out please what the implementation class is
    for the "Node" and "NodeList" objects in your application?

     
  • Michael Kay

    Michael Kay - 2006-10-06

    Logged In: YES
    user_id=251681

    I've taken a peek at the Xerces code, and it looks like this:

    public boolean hasChildNodes() {
    if (needsSyncChildren()) {
    synchronizeChildren();
    }
    return firstChild != null;
    }

    So my hunch was right: it's incurring a significant cost on
    the first call, but is essentially doing its own internal
    caching, so it's unlikely that caching the result in the
    caller will have any benefit at all. But once again, if you
    can produce measurements that show a benefit, then I'm
    interested.

     
  • John Baker

    John Baker - 2006-10-06

    Logged In: YES
    user_id=1613906

    Bizarre. Well, it was still appearing as a hot spot (I'll
    take some screenshots when I get a chance), and hence it
    would be nice to try and solve the problem. Perhaps there is
    no way to avoid making this call over and over again? Or
    perhaps there is a way?

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks