From: Max - M. <ma...@mi...> - 2015-05-28 12:36:33
|
Sorry but I missed the conclusion of the demonstration : This to show that after a variable is defined by a BIND statement it should not be considered as a floating variable. To my understanding, the only floating variables to be considered should come from VALUES clauses. Max. 2015-05-28 14:31 GMT+02:00 Max - Micrologiciel <ma...@mi...>: > Rob, > > thanks for the answers. > > Concerning the extend case, I then definitely believe there is a flaw in > our evaluation logic. > *To me, the join operation should behave as it would in relational logic > meaning comparing NULL with NULL will always return false so no result.* > > Here's my demonstration of the case. > > First about the join evaluation, based on the recommendation we get : > > 1. §18.5 : Join(Ω1, Ω2) = { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and μ > 1 and μ2 are compatible } > 2. $18.3 : Two solution mappings μ1 and μ2 are compatible if, for > every variable v in dom(μ1) and in dom(μ2), μ1(v) = μ2(v) > Here, μ1(v) = μ2(v) means that μ1(v) and μ2(v) are the same RDF term. > > Inferred from this the join definition would be equivalent to Join(Ω1, Ω2) > = { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and for each variable v in > intersect(dom(μ1) dom(μ2)) sameterm(μ1(v), μ2(v)) is true } > which means the join > > ?s ?p1 ?o1 . > ?s ?p2 ?o2 > > is equivalent to > > ?s1 ?p1 ?o1 . > ?s2 ?p2 ?o2 > FILTER (sameterm(s1,s2) > > But we also have : > > 1. $17 Specifically, FILTERs eliminate any solutions that, when > substituted into the expression, either result in an effective boolean > value of false or produce an error. > 2. §17.2 sameterm will produce a type error if any arguments are > unbound > > > Then about the extend case, let's say we have this graph pattern: > > ?s ?p ?o . FILTER(isLiteral(?o)) > ?s2 ?p2 ?o2 . > > The evaluation will return a cross join of both triple pattern mutlisets > since according to $18.3, they are compatible because having no common > variable. > > On the other hand, given the following pattern, > > {?s ?p ?o . FILTER(isBlank(?o)) } > BIND (iri(?o) as ?s2) . > ?s2 ?p2 ?o2 > > Under your logic, the join would return me the same results since iri(?o) > will produce a type error ?o being a blank node which is not accepted by > the Iri function. > > I do not agree with this since : > > 1. §10.1 Use of BIND ends the preceding basic graph pattern. > 2. If the evaluation of the expression produces an error, the variable > remains unbound for that solution but the query evaluation continues. > > Which means to me that in fact we now have to perform a join between the > two mutlisets μ1[?s ?p ?o ?s2] and μ2[?s2 ?p2 ?o2] > > > So still according to §18.5 and §18.3, both multisets are then now > incompatible since they share the ?s2 variable which can not be compared > under the sameterm conditions. > > Thus We should get no result back from the query. > > > > > > > > > > > > > > > 2015-05-28 12:50 GMT+02:00 Rob Vesse <rv...@do...>: > >> Max >> >> Comments inline: >> >> From: Max - Micrologiciel <ma...@mi...> >> Date: Wednesday, 27 May 2015 13:48 >> To: Rob Vesse <rv...@do...> >> Subject: About PR#36 >> >> Hi Rob, >> >> just been reviewing some comment you made in the #36 PR >> <https://bitbucket.org/dotnetrdf/dotnetrdf/pull-request/36/new-spin-library> about >> a change I made at first with the order of join arguments between the >> query's algebra and any possible BindingPattern. >> >> You wrote : >> "Though I think our handling of VALUES may already be broken in some >> cases anyway e.g. interaction with GROUP BY" >> >> Would you have some example that exposes the problem, so I can have a >> look into it ? >> >> >> If memory serves the problem is that we apply VALUES too soon. It should >> apply after any GROUP BY, HAVING and SELECT expressions but we apply it >> before those. This is a fairly trivial fix which I simply haven't got >> round to because it is a rare enough case that nobody has ever complained >> that it is broken (NB - It's fixed in the new Medusa engine on the 1.9 >> branch) >> >> >> >> On the other hand, I do not agree with you when you say that inverting >> the join parameters would break our compliance with the spec : since the >> join operation is normally commutative (and neither does the recommendation >> specifies explicitly in which order the sets are to be joined), we should >> be able to join the arguments in both orders and get the same results . >> >> >> In principal yes, however once you start doing indexed joins this does >> have the potential to break things if you aren't careful though we are >> fairly careful these days so probably doesn't make a difference nowadays >> >> Moreover, evaluating the bindings first could also lead to better >> performances since bound variables injection into the RHS whenever possible >> would lighten the multiset to join with. >> >> There is also an issue I encountered and I'd like to discuss with the >> Extend algebra. >> When used as a left join LHS, it prevents injecting the bound variables >> into the Rhs due to the CanFlowResultsToRhs workings and how the extended >> variable is always treated as floating. >> >> >> Well anything introduced by Extend always has to be treated as floating >> because the expression could produce an error or an unbound value >> >> There are a couple of cases when the expression is a constant value or a >> copy of a variable (provided we know that variable to be fixed) that we >> could special case but otherwise we can't do anything more. >> >> If you are generating Extends simply to introduce constants generating >> Values instead may be a better approach and will benefit from index joins >> as you note. >> >> Rob >> >> >> Perhaps it would be better to discuss these live, if you're available ? >> >> Cheers, >> Max. >> >> >> >> >> >> > |