Menu

Xmlunit 2.6.3- java out of memory gc overhead limit exceeded for huge xml

2021-10-06
2021-10-10
  • Shubham Arvikar

    Shubham Arvikar - 2021-10-06

    Hello, I am trying to compare 2 large xml files which contains around 2.4m+ lines (xml file size ~ 51 MB).

    This is the method I have created where I am sending xmlfilepath for 2 xmls. It is working fine with some small responses but for huge one it is throwing java oom gc error. Is there any limitation of xmlunit? do we have any solution for this type of issue?

    public static Iterable<difference> compareXmlFiles(String response1FilePath, String response2FilePath) throws IOException, SAXException {
    String expected = new String(Files.readAllBytes(Paths.get(response1FilePath)));
    String actual = new String(Files.readAllBytes(Paths.get(response2FilePath)));
    Diff diff = DiffBuilder.compare(expected)
    .withTest(actual)
    .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.conditionalBuilder().whenElementIsNamed("message").
    thenUse(ElementSelectors.byNameAndAttributes("symbol"))
    .whenElementIsNamed("fld").thenUse(ElementSelectors.byNameAndAttributes("id"))
    .elseUse(ElementSelectors.byName).build()))
    .withAttributeFilter(a -> !("soft_ver").equals(a.getName()))
    .ignoreWhitespace()
    .checkForSimilar()
    .build();
    return diff.getDifferences();</difference>

    Terminal error:

    java.lang.OutOfMemoryError: GC overhead limit exceeded

    at java.util.LinkedList$DescendingIterator.<init>(LinkedList.java:993)
    at java.util.LinkedList$DescendingIterator.<init>(LinkedList.java:992)
    at java.util.LinkedList.descendingIterator(LinkedList.java:986)
    at org.xmlunit.diff.XPathContext.getParentXPath(XPathContext.java:204)
    at org.xmlunit.diff.AbstractDifferenceEngine.getParentXPath(AbstractDifferenceEngine.java:195)
    at org.xmlunit.diff.DOMDifferenceEngine.compareNodes(DOMDifferenceEngine.java:152)
    at org.xmlunit.diff.DOMDifferenceEngine$NormalAttributeComparer$1.apply(DOMDifferenceEngine.java:480)
    at org.xmlunit.diff.AbstractDifferenceEngine$ComparisonState.andThen(AbstractDifferenceEngine.java:222)
    at org.xmlunit.diff.DOMDifferenceEngine$NormalAttributeComparer.apply(DOMDifferenceEngine.java:477)
    at org.xmlunit.diff.AbstractDifferenceEngine$ComparisonState.andThen(AbstractDifferenceEngine.java:222)
    at org.xmlunit.diff.DOMDifferenceEngine.compareElementAttributes(DOMDifferenceEngine.java:429)
    at org.xmlunit.diff.DOMDifferenceEngine.access$200(DOMDifferenceEngine.java:45)
    at org.xmlunit.diff.DOMDifferenceEngine$6.apply(DOMDifferenceEngine.java:379)
    at org.xmlunit.diff.AbstractDifferenceEngine$ComparisonState.andThen(AbstractDifferenceEngine.java:222)
    at org.xmlunit.diff.DOMDifferenceEngine.compareElements(DOMDifferenceEngine.java:376)
    at org.xmlunit.diff.DOMDifferenceEngine.nodeTypeSpecificComparison(DOMDifferenceEngine.java:214)
    at org.xmlunit.diff.DOMDifferenceEngine.access$000(DOMDifferenceEngine.java:45)
    at org.xmlunit.diff.DOMDifferenceEngine$2.apply(DOMDifferenceEngine.java:174)
    at org.xmlunit.diff.AbstractDifferenceEngine$ComparisonState.andThen(AbstractDifferenceEngine.java:222)
    at org.xmlunit.diff.DOMDifferenceEngine.compareNodes(DOMDifferenceEngine.java:171)
    at org.xmlunit.diff.DOMDifferenceEngine$8.apply(DOMDifferenceEngine.java:609)
    at org.xmlunit.diff.AbstractDifferenceEngine$ComparisonState.andThen(AbstractDifferenceEngine.java:222)
    at org.xmlunit.diff.DOMDifferenceEngine.compareNodeLists(DOMDifferenceEngine.java:606)
    at org.xmlunit.diff.DOMDifferenceEngine.access$100(DOMDifferenceEngine.java:45)
    at org.xmlunit.diff.DOMDifferenceEngine$3.apply(DOMDifferenceEngine.java:258)
    at org.xmlunit.diff.AbstractDifferenceEngine$ComparisonState.andThen(AbstractDifferenceEngine.java:222)
    at org.xmlunit.diff.AbstractDifferenceEngine$ComparisonState.andIfTrueThen(AbstractDifferenceEngine.java:226)
    at org.xmlunit.diff.DOMDifferenceEngine.compareNodes(DOMDifferenceEngine.java:179)
    at org.xmlunit.diff.DOMDifferenceEngine$8.apply(DOMDifferenceEngine.java:609)
    at org.xmlunit.diff.AbstractDifferenceEngine$ComparisonState.andThen(AbstractDifferenceEngine.java:222)
    at org.xmlunit.diff.DOMDifferenceEngine.compareNodeLists(DOMDifferenceEngine.java:606)
    at org.xmlunit.diff.DOMDifferenceEngine.access$100(DOMDifferenceEngine.java:45)
    
     
  • Stefan Bodewig

    Stefan Bodewig - 2021-10-10

    I'm afraid this is due to the way DOM is implemented (in Java?).

    XMLUnit uses the DOM model when comparing XML files. This means your Java process contains two parsed tree models of the whole documents in memory at the same time - and DOM is famous for being memory intensive.

    At least you seem to be able to load both documents, so there probably isn't that much memory missing and you could get by with increasing your JVM's heap memory a bit when starting the process.

    There are alternative APIs for parsing XML but they - at least those provided by the Java class library - really only allow the consumer to move forward though the document. This means you'd have to cache subtrees of the document in one way or another if you want to allow for sibling elements to appear out of order. It never felt worth the effort to implement an non-DOM DifferenceEngine because of this as you'd either have to limit yourself to "same order" documents or may end up just using a similar amount of memory for the caches that would be required when using DOM directly.

    I am aware that blaming DOM is easy and do not rule out memory inefficiencies in XMLUnit - after all you have been able to parse both documents. But I strongly believe the amount memory used by the DifferenceEngine itself is much less that the memory required to keep the DOM trees.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.