From: Bryan T. <br...@sy...> - 2014-07-03 23:06:48
|
I am not sure that union is inefficient. The basic execution mechanism runs both sides of the union concurrently and their outputs are merged at the downstream operator. What the query analyzer does not do for union is identify common subexpressions and make a decision to either lift them out or push them down into the union. This can cause problems with cardinality. Also, because union is not a sub query, it is always evaluated left to right (but in parallel across the union). There could be cases where bottom up evaluated would do better. But that is a very general observation. If you recall, BIND is not eagerly evaluated. Try replacing it with a constant and see if that fixes (or changes) things. This could well be the culprit. I am surprised to learn that these are turned into the same query. Unless I am mistaken, ASK is basically a slice using limit 1 and offset 0. The filter (not) exists is a subquery using a solution set hash join. These seem like very different queries to me. Bryan > On Jul 3, 2014, at 6:37 PM, Jeremy J Carroll <jj...@sy...> wrote: > > > > ASK > WHERE { > { BIND( <https://temp-base-image-2-jjc.syapse.com/bdm/api/kbobject/123> as ?key ) } > > ?record rdf:type/syapse:subClassOf ?key . > > { ?record sys:owner <https://temp-base-image-2-jjc.syapse.com/bdm/api/syuser/17> } > UNION > { ?record sys:assignedProject / syapse:isPrivate false . > } > UNION > { ?record sys:assignedProject / syapse:member <https://temp-base-image-2-jjc.syapse.com/bdm/api/syuser/17> . > } > } > > and > > SELECT ?key > WHERE { > { BIND( <https://temp-base-image-2-jjc.syapse.com/bdm/api/kbobject/123> as ?key ) } > > FILTER EXISTS { > ?record rdf:type/syapse:subClassOf ?key . > > { ?record sys:owner <https://temp-base-image-2-jjc.syapse.com/bdm/api/syuser/17> } > UNION > { ?record sys:assignedProject / syapse:isPrivate false . > } > UNION > { ?record sys:assignedProject / syapse:member <https://temp-base-image-2-jjc.syapse.com/bdm/api/syuser/17> . > } > } > > are basically the same query, in fact the latter appears to be transformed into the former …. but > with my data set the first takes 50ms, the second 3000ms > > Given that what I really want to do is to filter about 50 results, and the latter then takes more like 20 seconds, whereas I suspects the former will take about 3, I am likely to have to do a join in the client code … which seems all wrong! > > Any ideas? Are there are workarounds for this other than to make the ASK queries myself rather than the FILTER EXISTS > > (Also I am aware that the UNION is pretty inefficient in bigdata too - I can reformulate my kb to get rid of the UNION but ….) > > > > > > > ------------------------------------------------------------------------------ > Open source business process management suite built on Java and Eclipse > Turn processes into business applications with Bonita BPM Community Edition > Quickly connect people, data, and systems into organized workflows > Winner of BOSSIE, CODIE, OW2 and Gartner awards > http://p.sf.net/sfu/Bonitasoft > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |