Re: [dotNetRDF-Develop] About PR#36

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Rob,

thanks for the answers.

Concerning the extend case, I then definitely believe there is a flaw in
our evaluation logic.
*To me, the join operation should behave as it would in relational logic
meaning comparing NULL with NULL will always return false so no result.*

Here's my demonstration of the case.

First about the join evaluation, based on the recommendation we get :

   1. §18.5 : Join(Ω1, Ω2) = { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and μ1
   and μ2 are compatible }
   2. $18.3 : Two solution mappings μ1 and μ2 are compatible if, for every
   variable v in dom(μ1) and in dom(μ2), μ1(v) = μ2(v)
   Here, μ1(v) = μ2(v) means that μ1(v) and μ2(v) are the same RDF term.

Inferred from this the join definition would be equivalent to Join(Ω1, Ω2)
= { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and for each variable v in
intersect(dom(μ1) dom(μ2)) sameterm(μ1(v), μ2(v)) is true }
which means the join

?s ?p1 ?o1 .
?s ?p2 ?o2

is equivalent to

?s1 ?p1 ?o1 .
?s2 ?p2 ?o2
FILTER (sameterm(s1,s2)

But we also have :

   1. $17 Specifically, FILTERs eliminate any solutions that, when
   substituted into the expression, either result in an effective boolean
   value of false or produce an error.
   2. §17.2 sameterm will produce a type error if any arguments are unbound

Then about the extend case, let's say we have this graph pattern:

?s ?p ?o . FILTER(isLiteral(?o))
?s2 ?p2 ?o2 .

The evaluation will return a cross join of both triple pattern mutlisets
since according to $18.3, they are compatible because having no common
variable.

On the other hand, given the following pattern,

{?s ?p ?o . FILTER(isBlank(?o)) }
BIND (iri(?o) as ?s2) .
?s2 ?p2 ?o2

Under your logic, the join would return me the same results since iri(?o)
will produce a type error ?o being a blank node which is not accepted by
the Iri function.

I do not agree with this since :

   1. §10.1 Use of BIND ends the preceding basic graph pattern.
   2. If the evaluation of the expression produces an error, the variable
   remains unbound for that solution but the query evaluation continues.

Which means to me that in fact we now have to perform a join between the
two mutlisets μ1[?s ?p ?o ?s2] and μ2[?s2 ?p2 ?o2]

So still according to §18.5 and §18.3, both multisets are then now
incompatible since they share the ?s2 variable which can not be compared
under the sameterm conditions.

Thus We should get no result back from the query.

2015-05-28 12:50 GMT+02:00 Rob Vesse <rv...@do...>:

> Max
>
> Comments inline:
>
> From: Max - Micrologiciel <ma...@mi...>
> Date: Wednesday, 27 May 2015 13:48
> To: Rob Vesse <rv...@do...>
> Subject: About PR#36
>
> Hi Rob,
>
> just been reviewing some comment you made in the #36 PR
> <https://bitbucket.org/dotnetrdf/dotnetrdf/pull-request/36/new-spin-library> about
> a change I made at first with the order of join arguments between the
> query's algebra and any possible BindingPattern.
>
> You wrote :
> "Though I think our handling of VALUES may already be broken in some
> cases anyway e.g. interaction with GROUP BY"
>
> Would you have some example that exposes the problem, so I can have a look
> into it ?
>
>
> If memory serves the problem is that we apply VALUES too soon.  It should
> apply after any GROUP BY, HAVING and SELECT expressions but we apply it
> before those.  This is a fairly trivial fix which I simply haven't got
> round to because it is a rare enough case that nobody has ever complained
> that it is broken (NB - It's fixed in the new Medusa engine on the 1.9
> branch)
>
>
>
> On the other hand, I do not agree with you when you say that inverting the
> join parameters would break our compliance with the spec : since the join
> operation is normally commutative (and neither does the recommendation
> specifies explicitly in which order the sets are to be joined), we should
> be able to join the arguments in both orders and get the same results .
>
>
> In principal yes, however once you start doing indexed joins this does
> have the potential to break things if you aren't careful though we are
> fairly careful these days so probably doesn't make a difference nowadays
>
> Moreover, evaluating the bindings first could also lead to better
> performances since bound variables injection into the RHS whenever possible
> would lighten the multiset to join with.
>
> There is also an issue I encountered and I'd like to discuss with the
> Extend algebra.
> When used as a left join LHS, it prevents injecting the bound variables
> into the Rhs due to the CanFlowResultsToRhs workings and how the extended
> variable is always treated as floating.
>
>
> Well anything introduced by Extend always has to be treated as floating
> because the expression could produce an error or an unbound value
>
> There are a couple of cases when the expression is a constant value or a
> copy of a variable (provided we know that variable to be fixed) that we
> could special case but otherwise we can't do anything more.
>
> If you are generating Extends simply to introduce constants generating
> Values instead may be a better approach and will benefit from index joins
> as you note.
>
> Rob
>
>
> Perhaps it would be better to discuss these live, if you're available ?
>
> Cheers,
> Max.
>
>
>
>
>
>