What I am trying to do is parse HTML and change certain values in the HTML, such as image, form and link attributes. There doesn't appear to be a clear way to do this. Ideally, I'd like to parse the HTML into a Node and search the Node for the things I need to change then convert it back to a String.
Anybody doing this? Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Most tags have such functionality; for example the ImageTag has setImageURL(String url). After modifying it you use toHtml() to convert it back to a string.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the reply. I'm actually doing that, but how do i get the entire HTML, not just the changed tag? I want to start with an entire HTML page, change aspects of it and output the entire HTML page. When I run my code, nothing is output because there are no elements in the parser after calling the parse method (or so it appears).
Here is what I have:
Parser parser = new Parser();
parser.setInputHTML(responseBody);
What I am trying to do is parse HTML and change certain values in the HTML, such as image, form and link attributes. There doesn't appear to be a clear way to do this. Ideally, I'd like to parse the HTML into a Node and search the Node for the things I need to change then convert it back to a String.
Anybody doing this? Thanks.
Most tags have such functionality; for example the ImageTag has setImageURL(String url). After modifying it you use toHtml() to convert it back to a string.
Thanks for the reply. I'm actually doing that, but how do i get the entire HTML, not just the changed tag? I want to start with an entire HTML page, change aspects of it and output the entire HTML page. When I run my code, nothing is output because there are no elements in the parser after calling the parse method (or so it appears).
Here is what I have:
Parser parser = new Parser();
parser.setInputHTML(responseBody);
// prifix all <img src=""> URLs
TagNameFilter imgFilter = new TagNameFilter("img");
NodeList imgNodeList = parser.parse(imgFilter);
for (int i=0; i<imgNodeList.size(); i++) {
System.out.println("Processing <img>");
TagNode imgNode = (TagNode) imgNodeList.elementAt(i);
String imgSrc = imgNode.getAttribute("src");
imgNode.setAttribute("src", urlPrefix + imgSrc);
}
parser.reset();
// write the output
NodeIterator nodeItr = parser.elements();
while (nodeItr.hasMoreNodes()) {
Node node = nodeItr.nextNode();
System.out.println("Rendering - " + node.toHtml());
response.getOutputStream().print(node.toHtml());
}
You need to gather all the nodes first, then apply the filter and process it and then print out the full list:
NodeList all = parser.Parse (null);
Nodelist imgNodeList = all.extractAllNodesThatMatch (imgFilter);
// ... processing as above
System.out.println (all.toHtml ());