Re: [dotNetRDF-Develop] About PR#36

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

No I don't think so

Consider:

{
   ?s ?p ?o .
   BIND (?o / 0 AS ?example)
}

?o is a fixed variable, however regardless of whether it is fixed/floating
the expression it is involved in may error (or in this example always error)
and so we always have to treat ?example as a floating variable

That modification will only work for trivial left joins (and sometimes not
even then if you have FILTERs over the left join), deep left joins will
almost certainly be broken by that change.

The logic around whether we flow results is based on the logic used by
Apache Jena ARQ which is pretty much the reference implementation of SPARQL
since it is maintained by the editor of the SPARQL Query specification.

Rob

From:  Max - Micrologiciel <ma...@mi...>
Reply-To:  dotNetRDF Developer Discussion and Feature Request
<dot...@li...>
Date:  Thursday, 28 May 2015 16:01
To:  dotNetRDF Developer Discussion and Feature Request
<dot...@li...>
Subject:  Re: [dotNetRDF-Develop] About PR#36

> Ok forget this, I just realized I was running the test-case against my Sesame
> Store instead of the InMemory (which effectively handles the join example
> correctly).
> I'll file them a bug request.
> 
> Sorry for the time wasted and thanks for your patience.
> 
> However  for the BIND LHS with left join, the variables I use in the
> expression are also bound from a triplePattern. Here is an example
> 
> ex:cmd ex:hasDefaultGraph ?g .
> BIND(IRI(CONCAT("tmp:, STR(?g))) AS ?tmpG)
> OPTIONAL { GRAPH ?tmpG { ...
> 
> Could it be safe to say that the BIND variable is floating if any of the
> expression variables is floating and otherwise make it fixed ?
> And modify the CanFlowResultsToRhs l.312 into
> 
> if (rhsFloating.Any(v => lhsFloating.Contains(v) /* || lhsFixed.Contains(v) */
> )) return false;
> 
> This seems to do the trick in my case.
> 
> Max.
> 
> 
> 
> 2015-05-28 15:32 GMT+02:00 Rob Vesse <rv...@do...>:
>> Comments inline
>> 
>> 
>> From:  Max - Micrologiciel <ma...@mi...>
>> Date:  Thursday, 28 May 2015 13:36
>> To:  Rob Vesse <rv...@do...>
>> Cc:  dotNetRDF Developer Discussion and Feature Request
>> <dot...@li...>
>> Subject:  Re: About PR#36
>> 
>>> Sorry but I missed the conclusion of the demonstration :
>>> 
>>> This to show that after a variable is defined by a BIND statement it should
>>> not be considered as a floating variable.
>> 
>> However your conclusion about floating variables and BIND is wrong at least
>> according to how we define and use the concept of a floating variable within
>> the Leviathan engine
>> 
>> A floating variable is a variable whose value is not guaranteed to be bound
>> which as I pointed out is the exact definition of a BIND variable, the
>> expression could error and so the variable could be unbound
>> 
>>> 
>>> 
>>> To my understanding, the only floating variables to be considered should
>>> come from VALUES clauses.
>> 
>> Floating variables can come from BIND, OPTIONAL, VALUES, SELECT expressions,
>> aggregates I.e. anywhere where an expression is evaluated or where it is
>> possible to have unbound values
>> 
>>> 
>>> Max.
>>> 
>>> 2015-05-28 14:31 GMT+02:00 Max - Micrologiciel <ma...@mi...>:
>>>> Rob,
>>>> 
>>>> thanks for the answers.
>>>> 
>>>> Concerning the extend case, I then definitely believe there is a flaw in
>>>> our evaluation logic.
>>>> To me, the join operation should behave as it would in relational logic
>>>> meaning comparing NULL with NULL will always return false so no result.
>> 
>> It already does, you seem to be conflating joins with left joins (OPTIONAL)
>> and the two are not the same
>> 
>>>> 
>>>> Here's my demonstration of the case.
>>>> 
>>>> First about the join evaluation, based on the recommendation we get :
>>>> 1. §18.5 : Join(Ω1, Ω2) = { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and μ1
>>>> and μ2 are compatible }
>>>> 2. 
>>>> 3. $18.3 : Two solution mappings μ1 and μ2 are compatible if, for every
>>>> variable v in dom(μ1) and in dom(μ2), μ1(v) = μ2(v)
>>>> 4. Here, μ1(v) = μ2(v) means that μ1(v) and μ2(v) are the same RDF term.
>>>> Inferred from this the join definition would be equivalent to Join(Ω1, Ω2)
>>>> = { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and for each variable v in
>>>> intersect(dom(μ1) dom(μ2)) sameterm(μ1(v), μ2(v)) is true }
>>>> which means the join
>>>> 
>>>> ?s ?p1 ?o1 .
>>>> ?s ?p2 ?o2
>>>> 
>>>> is equivalent to
>>>> 
>>>> ?s1 ?p1 ?o1 .
>>>> ?s2 ?p2 ?o2
>>>> FILTER (sameterm(s1,s2)
>>>> 
>>>> But we also have :
>>>> 1. $17 Specifically, FILTERs eliminate any solutions that, when substituted
>>>> into the expression, either result in an effective boolean value of false
>>>> or produce an error.
>>>> 2. 
>>>> 3. §17.2 sameterm will produce a type error if any arguments are unbound
>>>> 
>>>> Then about the extend case, let's say we have this graph pattern:
>>>> 
>>>> ?s ?p ?o . FILTER(isLiteral(?o))
>>>> ?s2 ?p2 ?o2 .
>>>>  
>>>> The evaluation will return a cross join of both triple pattern mutlisets
>>>> since according to $18.3, they are compatible because having no common
>>>> variable.
>>>> 
>>>> On the other hand, given the following pattern,
>>>> 
>>>> {?s ?p ?o . FILTER(isBlank(?o)) }
>>>> BIND (iri(?o) as ?s2) .
>>>> ?s2 ?p2 ?o2
>>>> 
>>>> Under your logic, the join would return me the same results since iri(?o)
>>>> will produce a type error ?o being a blank node which is not accepted by
>>>> the Iri function.
>> 
>> Nowhere did I say this
>> 
>> With regards to index joins we were talking specifically about the case of a
>> BIND being on the LHS of an OPTIONAL which is completely different because
>> left joins are not commutative
>> 
>> Rob
>> 
>>>> 
>>>> 
>>>> I do not agree with this since :
>>>> 1. §10.1 Use of BIND ends the preceding basic graph pattern.
>>>> 2. 
>>>> 3. If the evaluation of the expression produces an error,  the variable
>>>> remains unbound for that solution but the query evaluation continues.
>>>> Which means to me that in fact we now have to perform a join between the
>>>> two mutlisets μ1[?s ?p ?o ?s2] and μ2[?s2 ?p2 ?o2]
>>>>> 
>>>> So still according to §18.5 and §18.3, both multisets are then now
>>>> incompatible since they share the ?s2 variable which can not be compared
>>>> under the sameterm conditions.
>>>> 
>>>> Thus We should get no result back from the query.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 2015-05-28 12:50 GMT+02:00 Rob Vesse <rv...@do...>:
>>>>> Max
>>>>> 
>>>>> Comments inline:
>>>>> 
>>>>> From:  Max - Micrologiciel <ma...@mi...>
>>>>> Date:  Wednesday, 27 May 2015 13:48
>>>>> To:  Rob Vesse <rv...@do...>
>>>>> Subject:  About PR#36
>>>>> 
>>>>>> Hi Rob,
>>>>>> 
>>>>>> just been reviewing some comment you made in the #36 PR
>>>>>> <https://bitbucket.org/dotnetrdf/dotnetrdf/pull-request/36/new-spin-libra
>>>>>> ry>  about a change I made at first with the order of join arguments
>>>>>> between the query's algebra and any possible BindingPattern.
>>>>>> 
>>>>>> You wrote : 
>>>>>> "Though I think our handling of VALUES may already be broken in some
>>>>>> cases anyway e.g. interaction with GROUP BY"
>>>>>> 
>>>>>> Would you have some example that exposes the problem, so I can have a
>>>>>> look into it ?
>>>>> 
>>>>> If memory serves the problem is that we apply VALUES too soon.  It should
>>>>> apply after any GROUP BY, HAVING and SELECT expressions but we apply it
>>>>> before those.  This is a fairly trivial fix which I simply haven't got
>>>>> round to because it is a rare enough case that nobody has ever complained
>>>>> that it is broken (NB - It's fixed in the new Medusa engine on the 1.9
>>>>> branch)
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On the other hand, I do not agree with you when you say that inverting
>>>>>> the join parameters would break our compliance with the spec : since the
>>>>>> join operation is normally commutative (and neither does the
>>>>>> recommendation specifies explicitly in which order the sets are to be
>>>>>> joined), we should be able to join the arguments in both orders and get
>>>>>> the same results .
>>>>> 
>>>>> In principal yes, however once you start doing indexed joins this does
>>>>> have the potential to break things if you aren't careful though we are
>>>>> fairly careful these days so probably doesn't make a difference nowadays
>>>>> 
>>>>>> Moreover, evaluating the bindings first could also lead to better
>>>>>> performances since bound variables injection into the RHS whenever
>>>>>> possible would lighten the multiset to join with.
>>>>>> 
>>>>>> There is also an issue I encountered and I'd like to discuss with the
>>>>>> Extend algebra.
>>>>>> When used as a left join LHS, it prevents injecting the bound variables
>>>>>> into the Rhs due to the CanFlowResultsToRhs workings and how the extended
>>>>>> variable is always treated as floating.
>>>>> 
>>>>> Well anything introduced by Extend always has to be treated as floating
>>>>> because the expression could produce an error or an unbound value
>>>>> 
>>>>> There are a couple of cases when the expression is a constant value or a
>>>>> copy of a variable (provided we know that variable to be fixed) that we
>>>>> could special case but otherwise we can't do anything more.
>>>>> 
>>>>> If you are generating Extends simply to introduce constants generating
>>>>> Values instead may be a better approach and will benefit from index joins
>>>>> as you note.
>>>>> 
>>>>> Rob
>>>>> 
>>>>>> 
>>>>>> Perhaps it would be better to discuss these live, if you're available ?
>>>>>> 
>>>>>> Cheers,
>>>>>> Max.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
>> 
----------------------------------------------------------------------------->>
-
>> 
>> _______________________________________________
>> dotNetRDF-develop mailing list
>> dot...@li...
>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop
>> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________ dotNetRDF-develop mailing list
> dot...@li...
> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop