The following query:
for $p in doc("PART.xml")//PART[P_PARTKEY < 20]
for $ps in doc("PARTSUPP.xml")//PARTSUPP
where $ps/PS_AVAILQTY = max(doc("PARTSUPP.xml")//PARTSUPP[PS_PARTKEY < $p/P_PARTKEY]/PS_AVAILQTY)
order by $ps/PS_PARTKEY, $ps/PS_SUPPKEY, $p/P_PARTKEY
return <match>{$p}{$ps}</match>
In Saxon 8.5 it evaluates significantly slower compared to 8.4.
Obviously this depends on the actual documents being queried, but for a given set in our environment approximately 600 ms compared to nearly 70 seconds.
If helpful, I can provide you a PART and PARTSUPP document.
Thanks,
Marc
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I can't see anything in the query execution plans produced by 8.4 and 8.5 that would obviously account for this difference in performance: both appear to be doing a nested-loop join as one would expect. In fact, the 8.5 plan looks a bit better on the surface, for example it avoids sorting the argument of the max() function into document order.
The Saxon-SA8.5 plan uses a hash join, again as one would expect: it will be interesting (at least for me) to see what improvement this achieves.
Clearly there was no intention that in optimizing joins for Saxon-SA, there should be any performance regression in Saxon-B, and the join benchmarks that I ran didn't show any.
Michael Kay
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Turns out to be a rare performance bug that was present in Saxon 8.4, but wasn't activated by this query (or any others that I've come across!) because the optimizer produced a slightly different execution plan.
I've logged the bug and provided a replacement for the MemoClosure module at
Michael,
The following query:
for $p in doc("PART.xml")//PART[P_PARTKEY < 20]
for $ps in doc("PARTSUPP.xml")//PARTSUPP
where $ps/PS_AVAILQTY = max(doc("PARTSUPP.xml")//PARTSUPP[PS_PARTKEY < $p/P_PARTKEY]/PS_AVAILQTY)
order by $ps/PS_PARTKEY, $ps/PS_SUPPKEY, $p/P_PARTKEY
return <match>{$p}{$ps}</match>
In Saxon 8.5 it evaluates significantly slower compared to 8.4.
Obviously this depends on the actual documents being queried, but for a given set in our environment approximately 600 ms compared to nearly 70 seconds.
If helpful, I can provide you a PART and PARTSUPP document.
Thanks,
Marc
Could you send me the source documents please?
I can't see anything in the query execution plans produced by 8.4 and 8.5 that would obviously account for this difference in performance: both appear to be doing a nested-loop join as one would expect. In fact, the 8.5 plan looks a bit better on the surface, for example it avoids sorting the argument of the max() function into document order.
The Saxon-SA8.5 plan uses a hash join, again as one would expect: it will be interesting (at least for me) to see what improvement this achieves.
Clearly there was no intention that in optimizing joins for Saxon-SA, there should be any performance regression in Saxon-B, and the join benchmarks that I ran didn't show any.
Michael Kay
Sure, I'll provide you those xml documents.
I'm going to email them to you as I'm not sure how to attach files to these sourceforge messages.
Regards,
Marc
Turns out to be a rare performance bug that was present in Saxon 8.4, but wasn't activated by this query (or any others that I've come across!) because the optimizer produced a slightly different execution plan.
I've logged the bug and provided a replacement for the MemoClosure module at
https://sourceforge.net/tracker/index.php?func=detail&aid=1252878&group_id=29872&atid=397617