Menu

Comparing certain elements from two XML files

carlos
2008-08-21
2013-03-03
  • carlos

    carlos - 2008-08-21

    I need to compare a section of two XML files by using the standard Diff method. I'm only able to compare the complete documents and not a particular node of each. Any ideas on how to do this? Is it possible to exclude elements of the XML so they don't get compared?

     
    • Joerg Glissmann

      Joerg Glissmann - 2008-09-02

      Hi Carlos,

      I faced a similar problem a while ago, and I solved it this way:

      I put a properties file somewhere with sections like:

      TagsToRemove0=xxx.xxx.xxx
      TagsToRemove1=yyy.yyy.yyy
      ..

      Then I read the values of this into an ArrayList of Strings:

      // complete sections to remove in both xml documents before any comparison is done
      int h = 0;
      List removeTags = new ArrayList();
      while (props.containsKey("TagsToRemove" + h)) {
          removeTags.add(props.getProperty("TagsToRemove" + h));
          h++;
      }
      myTagsToRemove = (String[]) removeTags.toArray(new String[removeTags.size()]);

      Then I take the two XML Documents which I want to compare (as org.w3c.dom.Document objects), get a list of node names which match the entries in the properties file and remove them in both Documents. Only then do I run the comparison:

      public DetailedDiff testXMLIdentical(Document previous, Document current) {
          // remove sections of the xml documents according to myTagsToRemove
          for (int i=0; i<myTagsToRemove.length; i++) {
              if ((myTagsToRemove[i] != null) && (! "".equals(myTagsToRemove[i]))) {
                  // for the previous (control) document
                  NodeList prevNodesToRemove = previous.getElementsByTagName(myTagsToRemove[i]);
                  while (prevNodesToRemove.getLength() > 0) {
                      // always remove the first item, as the list will get shorter
                      Node n = prevNodesToRemove.item(0);
                      String nodeName = n.getLocalName();
                      n.getParentNode().removeChild(n);
                      log.debug("testXMLIdentical: removed node from previous xml file: " + nodeName);
                  }
      // do the same thing for for the current (test) document
              }
          }

          Diff myDiff = new Diff(previous, current);
          return getDetailedDiff(myDiff);
      }

      This removes fairly radically any nodes before any comparison is done. (Especially helpful if you want to ignore structural differences of files which you want to compare.)

      HTH,
      glissi

       
    • Stefan Bodewig

      Stefan Bodewig - 2008-10-09

      Are those sections proper trees in themselves or do you need to compare various fragments of the document (and maybe even ignore parts of the fragments again)?

      For proper trees you may be able to create two new Documents and use Document.adoptNode to only copy the trees you want to compare.

      For more complex scenarios I'm not convinced that you really gain much from filtering and putting together different documents to run Diff against.  It may be simpler to do a couple of XPath comparisons for your docs - it really depends.

      You certainly can use DifferenceEngine directly (without using Diff or DetailedDiff) who's compare method only requires two Nodes to compare which don't have to be document instances.

       
  • Anonymous

    Anonymous - 2011-07-22

    Hi, just wanted to say that I too wanted this functionality.

    My use case is, a web application is returning some HTML; and I want to compare a subtree of that HTML to a control tree.

    So I use engine.getMatchingNodes("xpathExpression", htmlDocumentBuilder.parse(source)) to get the subtree I want to compare.

    Then I need some way to compare the subtree against my control document. I will try your suggestion about adoptNode, and if that doesn't work, I'll try what you said about DifferenceEngine.

    Thanks for a useful tool.

     

Log in to post a comment.

MongoDB Logo MongoDB