I need to compare a section of two XML files by using the standard Diff method. I'm only able to compare the complete documents and not a particular node of each. Any ideas on how to do this? Is it possible to exclude elements of the XML so they don't get compared?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Then I read the values of this into an ArrayList of Strings:
// complete sections to remove in both xml documents before any comparison is done
int h = 0;
List removeTags = new ArrayList();
while (props.containsKey("TagsToRemove" + h)) {
removeTags.add(props.getProperty("TagsToRemove" + h));
h++;
}
myTagsToRemove = (String[]) removeTags.toArray(new String[removeTags.size()]);
Then I take the two XML Documents which I want to compare (as org.w3c.dom.Document objects), get a list of node names which match the entries in the properties file and remove them in both Documents. Only then do I run the comparison:
public DetailedDiff testXMLIdentical(Document previous, Document current) {
// remove sections of the xml documents according to myTagsToRemove
for (int i=0; i<myTagsToRemove.length; i++) {
if ((myTagsToRemove[i] != null) && (! "".equals(myTagsToRemove[i]))) {
// for the previous (control) document
NodeList prevNodesToRemove = previous.getElementsByTagName(myTagsToRemove[i]);
while (prevNodesToRemove.getLength() > 0) {
// always remove the first item, as the list will get shorter
Node n = prevNodesToRemove.item(0);
String nodeName = n.getLocalName();
n.getParentNode().removeChild(n);
log.debug("testXMLIdentical: removed node from previous xml file: " + nodeName);
}
// do the same thing for for the current (test) document
}
}
Diff myDiff = new Diff(previous, current);
return getDetailedDiff(myDiff);
}
This removes fairly radically any nodes before any comparison is done. (Especially helpful if you want to ignore structural differences of files which you want to compare.)
HTH,
glissi
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Are those sections proper trees in themselves or do you need to compare various fragments of the document (and maybe even ignore parts of the fragments again)?
For proper trees you may be able to create two new Documents and use Document.adoptNode to only copy the trees you want to compare.
For more complex scenarios I'm not convinced that you really gain much from filtering and putting together different documents to run Diff against. It may be simpler to do a couple of XPath comparisons for your docs - it really depends.
You certainly can use DifferenceEngine directly (without using Diff or DetailedDiff) who's compare method only requires two Nodes to compare which don't have to be document instances.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-07-22
Hi, just wanted to say that I too wanted this functionality.
My use case is, a web application is returning some HTML; and I want to compare a subtree of that HTML to a control tree.
So I use engine.getMatchingNodes("xpathExpression", htmlDocumentBuilder.parse(source)) to get the subtree I want to compare.
Then I need some way to compare the subtree against my control document. I will try your suggestion about adoptNode, and if that doesn't work, I'll try what you said about DifferenceEngine.
Thanks for a useful tool.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I need to compare a section of two XML files by using the standard Diff method. I'm only able to compare the complete documents and not a particular node of each. Any ideas on how to do this? Is it possible to exclude elements of the XML so they don't get compared?
Hi Carlos,
I faced a similar problem a while ago, and I solved it this way:
I put a properties file somewhere with sections like:
TagsToRemove0=xxx.xxx.xxx
TagsToRemove1=yyy.yyy.yyy
..
Then I read the values of this into an ArrayList of Strings:
// complete sections to remove in both xml documents before any comparison is done
int h = 0;
List removeTags = new ArrayList();
while (props.containsKey("TagsToRemove" + h)) {
removeTags.add(props.getProperty("TagsToRemove" + h));
h++;
}
myTagsToRemove = (String[]) removeTags.toArray(new String[removeTags.size()]);
Then I take the two XML Documents which I want to compare (as org.w3c.dom.Document objects), get a list of node names which match the entries in the properties file and remove them in both Documents. Only then do I run the comparison:
public DetailedDiff testXMLIdentical(Document previous, Document current) {
// remove sections of the xml documents according to myTagsToRemove
for (int i=0; i<myTagsToRemove.length; i++) {
if ((myTagsToRemove[i] != null) && (! "".equals(myTagsToRemove[i]))) {
// for the previous (control) document
NodeList prevNodesToRemove = previous.getElementsByTagName(myTagsToRemove[i]);
while (prevNodesToRemove.getLength() > 0) {
// always remove the first item, as the list will get shorter
Node n = prevNodesToRemove.item(0);
String nodeName = n.getLocalName();
n.getParentNode().removeChild(n);
log.debug("testXMLIdentical: removed node from previous xml file: " + nodeName);
}
// do the same thing for for the current (test) document
}
}
Diff myDiff = new Diff(previous, current);
return getDetailedDiff(myDiff);
}
This removes fairly radically any nodes before any comparison is done. (Especially helpful if you want to ignore structural differences of files which you want to compare.)
HTH,
glissi
Are those sections proper trees in themselves or do you need to compare various fragments of the document (and maybe even ignore parts of the fragments again)?
For proper trees you may be able to create two new Documents and use Document.adoptNode to only copy the trees you want to compare.
For more complex scenarios I'm not convinced that you really gain much from filtering and putting together different documents to run Diff against. It may be simpler to do a couple of XPath comparisons for your docs - it really depends.
You certainly can use DifferenceEngine directly (without using Diff or DetailedDiff) who's compare method only requires two Nodes to compare which don't have to be document instances.
Hi, just wanted to say that I too wanted this functionality.
My use case is, a web application is returning some HTML; and I want to compare a subtree of that HTML to a control tree.
So I use engine.getMatchingNodes("xpathExpression", htmlDocumentBuilder.parse(source)) to get the subtree I want to compare.
Then I need some way to compare the subtree against my control document. I will try your suggestion about adoptNode, and if that doesn't work, I'll try what you said about DifferenceEngine.
Thanks for a useful tool.