Re: [dotNetRDF-Develop] About PR#36

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Sorry but I missed the conclusion of the demonstration :

This to show that after a variable is defined by a BIND statement it should
not be considered as a floating variable.

To my understanding, the only floating variables to be considered should
come from VALUES clauses.

Max.

2015-05-28 14:31 GMT+02:00 Max - Micrologiciel <ma...@mi...>:

> Rob,
>
> thanks for the answers.
>
> Concerning the extend case, I then definitely believe there is a flaw in
> our evaluation logic.
> *To me, the join operation should behave as it would in relational logic
> meaning comparing NULL with NULL will always return false so no result.*
>
> Here's my demonstration of the case.
>
> First about the join evaluation, based on the recommendation we get :
>
>    1. §18.5 : Join(Ω1, Ω2) = { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and μ
>    1 and μ2 are compatible }
>    2. $18.3 : Two solution mappings μ1 and μ2 are compatible if, for
>    every variable v in dom(μ1) and in dom(μ2), μ1(v) = μ2(v)
>    Here, μ1(v) = μ2(v) means that μ1(v) and μ2(v) are the same RDF term.
>
> Inferred from this the join definition would be equivalent to Join(Ω1, Ω2)
> = { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and for each variable v in
> intersect(dom(μ1) dom(μ2)) sameterm(μ1(v), μ2(v)) is true }
> which means the join
>
> ?s ?p1 ?o1 .
> ?s ?p2 ?o2
>
> is equivalent to
>
> ?s1 ?p1 ?o1 .
> ?s2 ?p2 ?o2
> FILTER (sameterm(s1,s2)
>
> But we also have :
>
>    1. $17 Specifically, FILTERs eliminate any solutions that, when
>    substituted into the expression, either result in an effective boolean
>    value of false or produce an error.
>    2. §17.2 sameterm will produce a type error if any arguments are
>    unbound
>
>
> Then about the extend case, let's say we have this graph pattern:
>
> ?s ?p ?o . FILTER(isLiteral(?o))
> ?s2 ?p2 ?o2 .
>
> The evaluation will return a cross join of both triple pattern mutlisets
> since according to $18.3, they are compatible because having no common
> variable.
>
> On the other hand, given the following pattern,
>
> {?s ?p ?o . FILTER(isBlank(?o)) }
> BIND (iri(?o) as ?s2) .
> ?s2 ?p2 ?o2
>
> Under your logic, the join would return me the same results since iri(?o)
> will produce a type error ?o being a blank node which is not accepted by
> the Iri function.
>
> I do not agree with this since :
>
>    1. §10.1 Use of BIND ends the preceding basic graph pattern.
>    2. If the evaluation of the expression produces an error, the variable
>    remains unbound for that solution but the query evaluation continues.
>
> Which means to me that in fact we now have to perform a join between the
> two mutlisets μ1[?s ?p ?o ?s2] and μ2[?s2 ?p2 ?o2]
>
>
> So still according to §18.5 and §18.3, both multisets are then now
> incompatible since they share the ?s2 variable which can not be compared
> under the sameterm conditions.
>
> Thus We should get no result back from the query.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 2015-05-28 12:50 GMT+02:00 Rob Vesse <rv...@do...>:
>
>> Max
>>
>> Comments inline:
>>
>> From: Max - Micrologiciel <ma...@mi...>
>> Date: Wednesday, 27 May 2015 13:48
>> To: Rob Vesse <rv...@do...>
>> Subject: About PR#36
>>
>> Hi Rob,
>>
>> just been reviewing some comment you made in the #36 PR
>> <https://bitbucket.org/dotnetrdf/dotnetrdf/pull-request/36/new-spin-library> about
>> a change I made at first with the order of join arguments between the
>> query's algebra and any possible BindingPattern.
>>
>> You wrote :
>> "Though I think our handling of VALUES may already be broken in some
>> cases anyway e.g. interaction with GROUP BY"
>>
>> Would you have some example that exposes the problem, so I can have a
>> look into it ?
>>
>>
>> If memory serves the problem is that we apply VALUES too soon.  It should
>> apply after any GROUP BY, HAVING and SELECT expressions but we apply it
>> before those.  This is a fairly trivial fix which I simply haven't got
>> round to because it is a rare enough case that nobody has ever complained
>> that it is broken (NB - It's fixed in the new Medusa engine on the 1.9
>> branch)
>>
>>
>>
>> On the other hand, I do not agree with you when you say that inverting
>> the join parameters would break our compliance with the spec : since the
>> join operation is normally commutative (and neither does the recommendation
>> specifies explicitly in which order the sets are to be joined), we should
>> be able to join the arguments in both orders and get the same results .
>>
>>
>> In principal yes, however once you start doing indexed joins this does
>> have the potential to break things if you aren't careful though we are
>> fairly careful these days so probably doesn't make a difference nowadays
>>
>> Moreover, evaluating the bindings first could also lead to better
>> performances since bound variables injection into the RHS whenever possible
>> would lighten the multiset to join with.
>>
>> There is also an issue I encountered and I'd like to discuss with the
>> Extend algebra.
>> When used as a left join LHS, it prevents injecting the bound variables
>> into the Rhs due to the CanFlowResultsToRhs workings and how the extended
>> variable is always treated as floating.
>>
>>
>> Well anything introduced by Extend always has to be treated as floating
>> because the expression could produce an error or an unbound value
>>
>> There are a couple of cases when the expression is a constant value or a
>> copy of a variable (provided we know that variable to be fixed) that we
>> could special case but otherwise we can't do anything more.
>>
>> If you are generating Extends simply to introduce constants generating
>> Values instead may be a better approach and will benefit from index joins
>> as you note.
>>
>> Rob
>>
>>
>> Perhaps it would be better to discuss these live, if you're available ?
>>
>> Cheers,
>> Max.
>>
>>
>>
>>
>>
>>
>