[Htmlparser-developer] New class added to HTMLParser project
Brought to you by:
derrickoswald
From: Ian M. <ian...@gm...> - 2006-02-13 15:18:18
|
I've just commited a new class called NodeTreeWalker to CVS. It is located in org.htmlparser.util This class allows you to iterate through Node's in a tree or sub-tree in either a depth-first or breadth-first order (and you may switch the method being used during the search if desired). Think of it like a NodeIterator for a tree of Nodes instead of a linear sequence of Nodes. You may pick any Node in a document to be the root Node. It also supports limiting the depth that the search, particularly useful for iterative breadth-first searches where you wish to inspect one level of children at a time. The breadth-first search code (along with previous code I've written e.g. the next/previousSibling methods) originally came from a tree-traversal program using HTMLParser that I wrote at the company I work for (NetRank Ltd), which has had quite a bit of testing. I'm hoping that there aren't too many horrible bugs in it :) Limitations: The only limitation is that the root of the tree must be a single Node, so perhaps at some point we may wish to create something along the lines of a DocumentHoldingNode within which the entire Document is stored. That way, it could support traversing documents with multiple root Nodes (e.g. docType and html). Also, it might get stuck if someone deliberately mangles up a document tree such that a Nodes parents/grandparents are also the same Node. This could probably be checked for in this class, but would probably be better suited to the actual get/set parent/children methods as this issue would affect more than just this class. Comments/code review welcomed. Best wishes Ian Macfarlane NetRank Ltd ps: For future versions of the class, I would like to also see previousNode() functionality, but I do not expect to write this myself for the time being. If anyone wanted to write this, please go ahead, but make sure it's in the same style as the existing methods. |