|
From: David R. <da...@ub...> - 2013-04-06 17:18:44
Attachments:
Picture 3.png
|
After running some profiling on my tests I've discovered that XMLUnit is taking the bulk of the time and XPathContext.getXPath is taking 58% of XMLUnit's time. I've attached a VirtualVM Profiler screenshot. XPathContext.getXPath is called twice every time a comparison is done to add the xpath for the control and test value to the created Comparison. However, the xpath is only needed when the comparison fails or if match tracking is being done. Perhaps the Comparison should only be created if the compare fails or maybe the xpath should only be set when compare fails? d |
|
From: Stefan B. <bo...@ap...> - 2013-04-11 15:59:57
|
On 2013-04-06, David Rees wrote: > After running some profiling on my tests I've discovered that XMLUnit > is taking the bulk of the time and XPathContext.getXPath is taking 58% > of XMLUnit's time. That's huge. > XPathContext.getXPath is called twice every time a comparison is done > to add the xpath for the control and test value to the created > Comparison. However, the xpath is only needed when the comparison > fails or if match tracking is being done. Perhaps the Comparison > should only be created if the compare fails or maybe the xpath should > only be set when compare fails? Are we talking 2.0 here? Really, we've got a user :-) In that case this is not quite true. DifferenceEvaluator is called for each and every comparison and the implementation might base its decision on the actual XPath. One option may be to somehow copy XPathContext's state into the Comparison and evaluate it lazily only when the XPath needs to be constructed. This would require creating new objects with a copy of XPathContext's "path" member rather than building up the string. In the end that may cause similar pressure on the garbage collector but should be faster than building up the string (and running replace on it). Stefan |
|
From: David R. <da...@ub...> - 2013-04-11 17:35:58
|
Another approach to consider, in line with your thoughts, would be to teach XPathContext Level's their parents. Then you could just store the Level in the Comparison. Actually, since the Comparison has the nodes themselves can't it just compute the xpath using them (on demand)? Are there cases where this wouldn't work? d On 4/11/13 1:06 PM, David Rees wrote: > I've actually tracked it down to two things in the XPathContext.getXPath. > > The biggest is actually the "replace(SEP + SEP, SEP)". When I remove > that my XMLUnit benchmarks go from 0.11 to 0.06. I haven't played with > it enough yet to know enough about why that line is there so I can > replace it. > > I also took a first crack at using dynamic programming to store the > xpath expression in the Level objects. So its only computed once for > each level. That took it from 0.06 to 0.04. A patch with that first > crack is at https://sourceforge.net/p/xmlunit/bugs/61/. > > d > > > On 4/11/13 11:59 AM, Stefan Bodewig wrote: >> On 2013-04-06, David Rees wrote: >> >>> After running some profiling on my tests I've discovered that XMLUnit >>> is taking the bulk of the time and XPathContext.getXPath is taking 58% >>> of XMLUnit's time. >> That's huge. >> >>> XPathContext.getXPath is called twice every time a comparison is done >>> to add the xpath for the control and test value to the created >>> Comparison. However, the xpath is only needed when the comparison >>> fails or if match tracking is being done. Perhaps the Comparison >>> should only be created if the compare fails or maybe the xpath should >>> only be set when compare fails? >> Are we talking 2.0 here? Really, we've got a user :-) >> >> In that case this is not quite true. DifferenceEvaluator is called for >> each and every comparison and the implementation might base its decision >> on the actual XPath. >> >> One option may be to somehow copy XPathContext's state into the >> Comparison and evaluate it lazily only when the XPath needs to be >> constructed. This would require creating new objects with a copy of >> XPathContext's "path" member rather than building up the string. In the >> end that may cause similar pressure on the garbage collector but should >> be faster than building up the string (and running replace on it). >> >> Stefan > |
|
From: David R. <da...@ub...> - 2013-04-11 18:09:19
|
I've actually tracked it down to two things in the XPathContext.getXPath. The biggest is actually the "replace(SEP + SEP, SEP)". When I remove that my XMLUnit benchmarks go from 0.11 to 0.06. I haven't played with it enough yet to know enough about why that line is there so I can replace it. I also took a first crack at using dynamic programming to store the xpath expression in the Level objects. So its only computed once for each level. That took it from 0.06 to 0.04. A patch with that first crack is at https://sourceforge.net/p/xmlunit/bugs/61/. d On 4/11/13 11:59 AM, Stefan Bodewig wrote: > On 2013-04-06, David Rees wrote: > >> After running some profiling on my tests I've discovered that XMLUnit >> is taking the bulk of the time and XPathContext.getXPath is taking 58% >> of XMLUnit's time. > That's huge. > >> XPathContext.getXPath is called twice every time a comparison is done >> to add the xpath for the control and test value to the created >> Comparison. However, the xpath is only needed when the comparison >> fails or if match tracking is being done. Perhaps the Comparison >> should only be created if the compare fails or maybe the xpath should >> only be set when compare fails? > Are we talking 2.0 here? Really, we've got a user :-) > > In that case this is not quite true. DifferenceEvaluator is called for > each and every comparison and the implementation might base its decision > on the actual XPath. > > One option may be to somehow copy XPathContext's state into the > Comparison and evaluate it lazily only when the XPath needs to be > constructed. This would require creating new objects with a copy of > XPathContext's "path" member rather than building up the string. In the > end that may cause similar pressure on the garbage collector but should > be faster than building up the string (and running replace on it). > > Stefan |
|
From: Stefan B. <bo...@ap...> - 2013-04-13 14:15:28
|
On 2013-04-11, David Rees wrote: > On 4/11/13 1:06 PM, David Rees wrote: >> I've actually tracked it down to two things in the XPathContext.getXPath. >> The biggest is actually the "replace(SEP + SEP, SEP)". When I remove >> that my XMLUnit benchmarks go from 0.11 to 0.06. I haven't played with >> it enough yet to know enough about why that line is there so I can >> replace it. I don't recall under which circumstances it happens but there are certain edge cases. If you remove the line there should be failing unit tests that should tell you why I introduced it. Sorry, but I haven't looked at the actual code for three years or so - and thank you for working on it. >> I also took a first crack at using dynamic programming to store the >> xpath expression in the Level objects. So its only computed once for >> each level. That took it from 0.06 to 0.04. A patch with that first >> crack is at https://sourceforge.net/p/xmlunit/bugs/61/. I'll look into it later this weekend. > Actually, since the Comparison has the nodes themselves can't it just > compute the xpath using them (on demand)? Are there cases where this > wouldn't work? TBH I modeled this after the XMLUnit 1.x difference engine which calculates the XPath while walking the tree as well and never questioned that design. The trickies part would be knowing the sibblings on each level (what to put into the sqare brackets of /foo/bar[...]/baz on the XPath of a baz element) - this might benefit from some caching as you suggest to do for Level instances. Stefan |
|
From: Stefan B. <bo...@ap...> - 2013-04-14 18:34:48
|
On 2013-04-13, Stefan Bodewig wrote: >> On 4/11/13 1:06 PM, David Rees wrote: >>> I also took a first crack at using dynamic programming to store the >>> xpath expression in the Level objects. So its only computed once for >>> each level. That took it from 0.06 to 0.04. A patch with that first >>> crack is at https://sourceforge.net/p/xmlunit/bugs/61/. > I'll look into it later this weekend. I've committed your patch and ported it to C#. Deque#descendingIterator is Java6 while I tried to keep the code base compatible to Java5 so far. That seemed to be a good idea three years ago, not sure whether it really is anymore or whether Java6 as baseline is acceptable. If so we might question .NET 2.0 vs 3.5 as well. Stefan |
|
From: Stefan B. <bo...@ap...> - 2013-04-14 18:56:43
|
On 2013-04-13, Stefan Bodewig wrote: > On 2013-04-11, David Rees wrote: >> On 4/11/13 1:06 PM, David Rees wrote: >>> I've actually tracked it down to two things in the XPathContext.getXPath. >>> The biggest is actually the "replace(SEP + SEP, SEP)". When I remove >>> that my XMLUnit benchmarks go from 0.11 to 0.06. I haven't played with >>> it enough yet to know enough about why that line is there so I can >>> replace it. > I don't recall under which circumstances it happens but there are > certain edge cases. The document node itself is used like an element named "" so there is an initial Level with XPath "/" and when the document element is added we get two consecutive slashes. I've rewritten the XPath building to account for the special case and avoind the replace call. Actually there is an additional possibility for levels with empty names in the default case of appendChildren - but I don't think that can be triggered at all. Stefan |