The parent field points to the enclosing composite tag -- composite tags
are *not* returned by the lexer. The lexer produces a linear stream of
simple lexemes, without composite structure. You would need to use a parser.
That is, in the example <A href="yadda"><IMG href="baffa"></A>, the
image tag has the link tag as the parent, only for nodes produced by the
parser (this would be one node with one child). You could use the same
logic as below but you would need to dig recursively into each node
returned to do your checking. If it's always in a table, you need only
register the table scanner, so there would be less digging to do, since
all other non-table nodes would be just simple nodes (again with no
children).
Derrick
du du wrote:
> Hello everyone:
>
> i'd like to locate a specific string in a html page and then process
> information around it, so the whole scenario as:
>
> <html> <head>...</head>
> <body><table>
> <tr><td><p class=tablehead><b>Closing Time</b> </p></td></tr>
> <tr>.....</tr>
> </table>
> </body></html>
> In fact, I can locate "Closing Time", as well as its lexerNode,
> and thus, I could further locate its parentNode or children nodes. But
> when I using
> aNode.getParentNode() always throw null point error. Part of code like:
>
> ...
> Node aNode = lexer.nextNode();
> Node bNode;
> while(aNode != null){
> if (aNode.getText().indexOf("Closing Time")!=-1){
> bNode = aNode.getParent();
> System.out.println("current node="+_bNode_.getText());
> }
> aNode = lexer.nextNode();
> }
> ...
>
> I'll be very appreciate if somebody could give me help.
>
> henry
|