|
From: Bryan T. <br...@sy...> - 2010-08-27 18:57:51
|
Scale-out introduces other wrinkles, but for this specific query the query volume will be low since the subject and predicate are bound in the first triple pattern, so locality should trump in scale-out for this query as well.
If the query were less selective, for example if the subject in the first triple pattern were unbound, then I think that the outcome would be less clear cut. However, unselective queries like that are better executed using multi-block IO on the second triple pattern with a hash join against the intermediate results from the first triple pattern. So, a different kettle of fish all together.
Would you file an issue for this?
Thanks,
Bryan
> -----Original Message-----
> From: Mike Personick
> Sent: Friday, August 27, 2010 2:53 PM
> To: Bryan Thompson
> Cc: big...@li...
> Subject: RE: chosing the index for testing fully bound access
> paths based on index locality
>
> You would definitely get better locality by using POS instead
> of SPO in that case since the P and O remain constant for
> every asBound version of the (?c, d, e) tail. However, just
> playing devil's advocate, we've had the argument before over
> whether locality or concurrency wins in scale-out. For
> example, recall the discussion we had over forward or reverse
> properties (excerpted below). I argued that using forward
> properties would always win because you'd have better
> locality. You argued that the locality would be offset by
> increased parallelism from using reverse properties. How is
> this argument any different? Is it because we are talking
> about only one tail instead of many tails?
>
>
> ...
>
> The case where you were gathering multiple properties for A
> instead of just one is a little different:
>
> Forward:
>
> select ?a,?b,?c,?d
> where {
> ?a type A .
> ?a AtoB ?b .
> ?a AtoC ?c .
> ?a AtoD ?d .
> }
>
> Reverse:
>
> select ?a,?b,?c,?d
> where {
> ?a type A .
> ?b BtoA ?a .
> ?c CtoA ?a .
> ?d DtoA ?a .
> }
>
> In the case of forward you'd visit the POS index and then all
> the properties for a given ?a binding would be in the same
> place for the remainder of the tails (same SPO shard). For
> reverse you'd visit the POS index and then you'd have a set
> of bound PO tuples that would need to be visited using POS
> shards. Again, unlikely they'd be in the same place, which
> means network movement to complete the join. I assert that
> forward will always beat reverse in this case because you
> have all the data you need for tails 2-4 locally. Bryan
> asserts that reverse can overcome the lack of locality via
> concurrency.
>
> -----Original Message-----
> From: Bryan Thompson
> Sent: Friday, August 27, 2010 12:42 PM
> To: Mike Personick
> Cc: big...@li...
> Subject: chosing the index for testing fully bound access
> paths based on index locality
>
> Mike,
>
> David and I talked through a scenario where we might benefit
> by choosing a non-SPO index when testing a fully bound
> triple. Consider:
>
> query :- (a b ?c) AND (?c, d, e).
>
> Assuming that (a b ?c) is more selective, we would normally
> choose the SPO index for both access paths. However, the POS
> index will have better locality for (?c d e), so perhaps we
> would do better by sending the binding sets to that index?
>
> The guiding priciple would be, "when fully bound, choose the
> index with better locality based on the variable(s) in the
> triple pattern."
>
> If you think this makes sense, let's file and issue for this.
> I would have to review LUBM/BSBM to be certain, but I would
> not be suprised if both of those benchmarks included queries
> which had the same characteristic. As the number of
> variables in the second triple pattern increases, there will
> be less locality in the index so this might work better for
> one unbound triple patterns than for two unbound triple patterns.
>
> Thanks,
> Bryan
>
|