Thread: [Htmlparser-developer] New class added to HTMLParser project

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I've just commited a new class called NodeTreeWalker to CVS. It is
located in org.htmlparser.util

This class allows you to iterate through Node's in a tree or sub-tree
in either a depth-first or breadth-first order (and you may switch the
method being used during the search if desired). Think of it like a
NodeIterator for a tree of Nodes instead of a linear sequence of
Nodes.

You may pick any Node in a document to be the root Node. It also
supports limiting the depth that the search, particularly useful for
iterative breadth-first searches where you wish to inspect one level
of children at a time.

The breadth-first search code (along with previous code I've written
e.g. the next/previousSibling methods) originally came from a
tree-traversal program using HTMLParser that I wrote at the company I
work for (NetRank Ltd), which has had quite a bit of testing. I'm
hoping that there aren't too many horrible bugs in it :)

Limitations:
The only limitation is that the root of the tree must be a single
Node, so perhaps at some point we may wish to create something along
the lines of a DocumentHoldingNode within which the entire Document is
stored. That way, it could support traversing documents with multiple
root Nodes (e.g. docType and html).

Also, it might get stuck if someone deliberately mangles up a document
tree such that a Nodes parents/grandparents are also the same Node.
This could probably be checked for in this class, but would probably
be better suited to the actual get/set parent/children methods as this
issue would affect more than just this class.

Comments/code review welcomed.

Best wishes

Ian Macfarlane
NetRank Ltd

ps: For future versions of the class, I would like to also see
previousNode() functionality, but I do not expect to write this myself
for the time being. If anyone wanted to write this, please go ahead,
but make sure it's in the same style as the existing methods.

Thread: [Htmlparser-developer] New class added to HTMLParser project

htmlparser-developer