The NotFilter is bound to get you every node but the footer nodes, however it will be a linear list.
I would filter for the footer nodes and remove them from their parent:
NodeList footers = complete.extractAllNodesThatMatch (new TagNameFilter("footer"));
... foreach footer in the list
footer.getParent ().remove (footer);
adding:
The footerText needs to be added to the footerTag's children list:
footerTag.getChildren ().add (footerText);
Adding the footer just before the end of the <html> tag is the same, it's a simple add() which puts it at the end:
HtmlTag html;
... get the html tag somehow
html.getChildren ().add (my_new_footer);
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ah. Got it. I thought extractAllNodesThatMatch() extracted a list unrelated to the main node list. It is basically just selecting which nodes to act on - maybe selectAllNodesThatMatch() would have been clearer, but what's in a name? Thanks for the help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This removes the div node from the divs list, not the root list which are obviously different.
All I want to do is print out the HTML (all of it) without the footer div ?
If I try:
root.remove(div)
in place of:
divs.remove(div)
It doesn't find it (returns false) ....
How, exactly do I do this ?
Regards
Clive
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi. I'm stuck. Here is an HTML snippet from a multi-part MIME email:
<html>
<body>
<p>
Hello
<footer><p>Footer Text</footer>
<p>
Again
</body>
</html>
I want to remove the 'footer' tag (and it's text node) which could occur anywhere in the doc. So I try something like:
addNewFooter(Part part) {
Page page = new Page(part.getInputStream(), encoding);
Lexer lexer = new Lexer(page);
Parser parser = new Parser(lexer);
try {
log.info("Stripping HTML footer");
NodeList complete = parser.parse(null);
NodeList stripped = complete.extractAllNodesThatMatch(new NotFilter(new TagNameFilter("footer")));
// part.setContent(complete.toHtml(), "text/html; charset=" + encoding);
log.info(complete.toHtml());
} ....
This doesn't work and it isn't clear how I do it.
Next I want to add a new 'footer' tag just before the body close:
<html>
<body>
<p>
Hello
<p>
Again
<footer><p>Footer Text</footer>
</body>
</html>
I have no idea how to do this. I can see that you add nodes to node lists, but how exactly in this case ?
Do I create a node like this ..
TagNode footerTag = new TagNode();
footerTag.setTagName("footer");
TextNode footerText = new TextNode("Footer Text");
footerText.setParent(footerTag);
... and how do I stick it in the correct place in the tree (node list) ?
Thanks for any help you can offer.
Clive
removing:
The NotFilter is bound to get you every node but the footer nodes, however it will be a linear list.
I would filter for the footer nodes and remove them from their parent:
NodeList footers = complete.extractAllNodesThatMatch (new TagNameFilter("footer"));
... foreach footer in the list
footer.getParent ().remove (footer);
adding:
The footerText needs to be added to the footerTag's children list:
footerTag.getChildren ().add (footerText);
Adding the footer just before the end of the <html> tag is the same, it's a simple add() which puts it at the end:
HtmlTag html;
... get the html tag somehow
html.getChildren ().add (my_new_footer);
Ah. Got it. I thought extractAllNodesThatMatch() extracted a list unrelated to the main node list. It is basically just selecting which nodes to act on - maybe selectAllNodesThatMatch() would have been clearer, but what's in a name? Thanks for the help.
Nope. Not what I thought. Consider the following code:
try {
Parser parser = new Parser("file:///clive.html");
NodeList root = parser.parse(null);
NodeList divs = root.extractAllNodesThatMatch(new NodeClassFilter(Div.class), true);
System.out.println("found " + divs.size() + " div tags");
for(int i = 0; i < divs.size(); i++) {
TagNode div = (TagNode) divs.elementAt(i);
String id = div.getAttribute("id");
if(id != null && id.equals("__footer__")) {
System.out.println("found footer: " + div);
if(divs.remove(div)) {
System.out.println("removed node");
}
}
}
System.out.println(root.toHtml());
} catch(ParserException e) {
e.printStackTrace();
}
This removes the div node from the divs list, not the root list which are obviously different.
All I want to do is print out the HTML (all of it) without the footer div ?
If I try:
root.remove(div)
in place of:
divs.remove(div)
It doesn't find it (returns false) ....
How, exactly do I do this ?
Regards
Clive
You need:
div.getParent ().getChildren ().remove (div);
Need to go to the next node up the tree and remove the div node from
it's list of children.
Great. Now I get it ! Thanks