Thread: [Xmlunit-general] XPathContext.getXPath taking bulk of compare time

XMLUnit provides assertions that help testing code that produces XML.

Brought to you by: bodewig, custommonkey, timbacon

xmlunit-general

[Xmlunit-general] XPathContext.getXPath taking bulk of compare time

From: David R. <da...@ub...> - 2013-04-06 17:18:44

Attachments: Picture 3.png

After running some profiling on my tests I've discovered that XMLUnit is 
taking the bulk of the time and XPathContext.getXPath is taking 58% of 
XMLUnit's time. I've attached a VirtualVM Profiler screenshot.

XPathContext.getXPath is called twice every time a comparison is done to 
add the xpath for the control and test value to the created Comparison. 
However, the xpath is only needed when the comparison fails or if match 
tracking is being done. Perhaps the Comparison should only be created if 
the compare fails or maybe the xpath should only be set when compare fails?

d

Re: [Xmlunit-general] XPathContext.getXPath taking bulk of compare time

From: Stefan B. <bo...@ap...> - 2013-04-11 15:59:57

On 2013-04-06, David Rees wrote:

> After running some profiling on my tests I've discovered that XMLUnit
> is taking the bulk of the time and XPathContext.getXPath is taking 58%
> of XMLUnit's time.

That's huge.

> XPathContext.getXPath is called twice every time a comparison is done
> to add the xpath for the control and test value to the created
> Comparison. However, the xpath is only needed when the comparison
> fails or if match tracking is being done. Perhaps the Comparison
> should only be created if the compare fails or maybe the xpath should
> only be set when compare fails?

Are we talking 2.0 here?  Really, we've got a user :-)

In that case this is not quite true.  DifferenceEvaluator is called for
each and every comparison and the implementation might base its decision
on the actual XPath.

One option may be to somehow copy XPathContext's state into the
Comparison and evaluate it lazily only when the XPath needs to be
constructed.  This would require creating new objects with a copy of
XPathContext's "path" member rather than building up the string.  In the
end that may cause similar pressure on the garbage collector but should
be faster than building up the string (and running replace on it).

Stefan

Re: [Xmlunit-general] XPathContext.getXPath taking bulk of compare time

From: David R. <da...@ub...> - 2013-04-11 17:35:58

Another approach to consider, in line with your thoughts, would be to 
teach XPathContext Level's their parents. Then you could just store the 
Level in the Comparison.

Actually, since the Comparison has the nodes themselves can't it just 
compute the xpath using them (on demand)? Are there cases where this 
wouldn't work?

d

On 4/11/13 1:06 PM, David Rees wrote:
> I've actually tracked it down to two things in the XPathContext.getXPath.
>
> The biggest is actually the "replace(SEP + SEP, SEP)". When I remove 
> that my XMLUnit benchmarks go from 0.11 to 0.06. I haven't played with 
> it enough yet to know enough about why that line is there so I can 
> replace it.
>
> I also took a first crack at using dynamic programming to store the 
> xpath expression in the Level objects. So its only computed once for 
> each level. That took it from 0.06 to 0.04. A patch with that first 
> crack is at https://sourceforge.net/p/xmlunit/bugs/61/.
>
> d
>
>
> On 4/11/13 11:59 AM, Stefan Bodewig wrote:
>> On 2013-04-06, David Rees wrote:
>>
>>> After running some profiling on my tests I've discovered that XMLUnit
>>> is taking the bulk of the time and XPathContext.getXPath is taking 58%
>>> of XMLUnit's time.
>> That's huge.
>>
>>> XPathContext.getXPath is called twice every time a comparison is done
>>> to add the xpath for the control and test value to the created
>>> Comparison. However, the xpath is only needed when the comparison
>>> fails or if match tracking is being done. Perhaps the Comparison
>>> should only be created if the compare fails or maybe the xpath should
>>> only be set when compare fails?
>> Are we talking 2.0 here?  Really, we've got a user :-)
>>
>> In that case this is not quite true.  DifferenceEvaluator is called for
>> each and every comparison and the implementation might base its decision
>> on the actual XPath.
>>
>> One option may be to somehow copy XPathContext's state into the
>> Comparison and evaluate it lazily only when the XPath needs to be
>> constructed.  This would require creating new objects with a copy of
>> XPathContext's "path" member rather than building up the string.  In the
>> end that may cause similar pressure on the garbage collector but should
>> be faster than building up the string (and running replace on it).
>>
>> Stefan
>

Re: [Xmlunit-general] XPathContext.getXPath taking bulk of compare time

From: David R. <da...@ub...> - 2013-04-11 18:09:19

I've actually tracked it down to two things in the XPathContext.getXPath.

The biggest is actually the "replace(SEP + SEP, SEP)". When I remove 
that my XMLUnit benchmarks go from 0.11 to 0.06. I haven't played with 
it enough yet to know enough about why that line is there so I can 
replace it.

I also took a first crack at using dynamic programming to store the 
xpath expression in the Level objects. So its only computed once for 
each level. That took it from 0.06 to 0.04. A patch with that first 
crack is at https://sourceforge.net/p/xmlunit/bugs/61/.

d


On 4/11/13 11:59 AM, Stefan Bodewig wrote:
> On 2013-04-06, David Rees wrote:
>
>> After running some profiling on my tests I've discovered that XMLUnit
>> is taking the bulk of the time and XPathContext.getXPath is taking 58%
>> of XMLUnit's time.
> That's huge.
>
>> XPathContext.getXPath is called twice every time a comparison is done
>> to add the xpath for the control and test value to the created
>> Comparison. However, the xpath is only needed when the comparison
>> fails or if match tracking is being done. Perhaps the Comparison
>> should only be created if the compare fails or maybe the xpath should
>> only be set when compare fails?
> Are we talking 2.0 here?  Really, we've got a user :-)
>
> In that case this is not quite true.  DifferenceEvaluator is called for
> each and every comparison and the implementation might base its decision
> on the actual XPath.
>
> One option may be to somehow copy XPathContext's state into the
> Comparison and evaluate it lazily only when the XPath needs to be
> constructed.  This would require creating new objects with a copy of
> XPathContext's "path" member rather than building up the string.  In the
> end that may cause similar pressure on the garbage collector but should
> be faster than building up the string (and running replace on it).
>
> Stefan

Re: [Xmlunit-general] XPathContext.getXPath taking bulk of compare time

From: Stefan B. <bo...@ap...> - 2013-04-13 14:15:28

On 2013-04-11, David Rees wrote:

> On 4/11/13 1:06 PM, David Rees wrote:

>> I've actually tracked it down to two things in the XPathContext.getXPath.

>> The biggest is actually the "replace(SEP + SEP, SEP)". When I remove
>> that my XMLUnit benchmarks go from 0.11 to 0.06. I haven't played with
>> it enough yet to know enough about why that line is there so I can
>> replace it.

I don't recall under which circumstances it happens but there are
certain edge cases.  If you remove the line there should be failing unit
tests that should tell you why I introduced it.  Sorry, but I haven't
looked at the actual code for three years or so - and thank you for
working on it.

>> I also took a first crack at using dynamic programming to store the
>> xpath expression in the Level objects. So its only computed once for
>> each level. That took it from 0.06 to 0.04. A patch with that first
>> crack is at https://sourceforge.net/p/xmlunit/bugs/61/.

I'll look into it later this weekend.

> Actually, since the Comparison has the nodes themselves can't it just
> compute the xpath using them (on demand)? Are there cases where this
> wouldn't work?

TBH I modeled this after the XMLUnit 1.x difference engine which
calculates the XPath while walking the tree as well and never questioned
that design.  The trickies part would be knowing the sibblings on each
level (what to put into the sqare brackets of /foo/bar[...]/baz on the
XPath of a baz element) - this might benefit from some caching as you
suggest to do for Level instances.

Stefan

Re: [Xmlunit-general] XPathContext.getXPath taking bulk of compare time

From: Stefan B. <bo...@ap...> - 2013-04-14 18:34:48

On 2013-04-13, Stefan Bodewig wrote:

>> On 4/11/13 1:06 PM, David Rees wrote:

>>> I also took a first crack at using dynamic programming to store the
>>> xpath expression in the Level objects. So its only computed once for
>>> each level. That took it from 0.06 to 0.04. A patch with that first
>>> crack is at https://sourceforge.net/p/xmlunit/bugs/61/.

> I'll look into it later this weekend.

I've committed your patch and ported it to C#.

Deque#descendingIterator is Java6 while I tried to keep the code base
compatible to Java5 so far.  That seemed to be a good idea three years
ago, not sure whether it really is anymore or whether Java6 as baseline
is acceptable.  If so we might question .NET 2.0 vs 3.5 as well.

Stefan

Re: [Xmlunit-general] XPathContext.getXPath taking bulk of compare time

From: Stefan B. <bo...@ap...> - 2013-04-14 18:56:43

On 2013-04-13, Stefan Bodewig wrote:

> On 2013-04-11, David Rees wrote:

>> On 4/11/13 1:06 PM, David Rees wrote:

>>> I've actually tracked it down to two things in the XPathContext.getXPath.

>>> The biggest is actually the "replace(SEP + SEP, SEP)". When I remove
>>> that my XMLUnit benchmarks go from 0.11 to 0.06. I haven't played with
>>> it enough yet to know enough about why that line is there so I can
>>> replace it.

> I don't recall under which circumstances it happens but there are
> certain edge cases.

The document node itself is used like an element named "" so there is an
initial Level with XPath "/" and when the document element is added we
get two consecutive slashes.  I've rewritten the XPath building to
account for the special case and avoind the replace call.

Actually there is an additional possibility for levels with empty names
in the default case of appendChildren - but I don't think that can be
triggered at all.

Stefan