I started working on something like this a while back, but didn't quite finish.
True, HTMLParser has no concept of XPath. So what I did was write an ANTLR grammar for a subset of XPath that HTMLParser's NodeFilters could support. My grammar included Abstract Syntax Tree (AST) generation that could be used by some utility classes to generate the NodeFilter structures. I then used ANTLR to generate a parser for my expression language, and wrote the aforementioned utility classes.
I got as far as that before other tasks took precedence, and I was unable to finish testing my creation. But it does sound like you wanted something similar.
-Dan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I need to be able to locate an HTMLParser node which matches an Xpath expression. A typical expression looks like this;
/Document[@title='My Web Page']
/Html[1]
/Body[1]
/Paragraph[1]
/Image[2]
As far as I can tell HTMLParser has no concept of Xpath. Does anyone have a suggestion as to how to go about this?
I started working on something like this a while back, but didn't quite finish.
True, HTMLParser has no concept of XPath. So what I did was write an ANTLR grammar for a subset of XPath that HTMLParser's NodeFilters could support. My grammar included Abstract Syntax Tree (AST) generation that could be used by some utility classes to generate the NodeFilter structures. I then used ANTLR to generate a parser for my expression language, and wrote the aforementioned utility classes.
I got as far as that before other tasks took precedence, and I was unable to finish testing my creation. But it does sound like you wanted something similar.
-Dan