Re: [dotNetRDF-Develop] About the SPIN Processor

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

See my other email.  I would consider only supporting in-memory stores to
start with and worry about the extra complexities of arbitrary storage later

There are existing implementations of those implementations that already
proxy requests to arbitrary storage.  This doesn't solve the issue of
stopping people using the remote storage directly

Rob

From:  Max - Micrologiciel <ma...@mi...>
Date:  Wednesday, 4 February 2015 17:29
To:  Rob Vesse <rv...@do...>
Subject:  Re: About the SPIN Processor

> By the way I forgot to add that (of course) as long as the dotNetRDF SPIN
> library is in use over a store, no direct access to the store should be
> permitted.
> So the library should also help to define a public SPARQL endpoint (with full
> SQPARLSD support) if required.
> 
> Is it enough to implement the ISparqlQueryProcessor/ISparqlUpdateProcessor
> classes and bind them through the SparqlServer configuration or are they other
> considerations to take into account ?
> 
> Thanks,
> Max.
> 
> 
> 2015-02-03 22:38 GMT+01:00 Rob Vesse <rv...@do...>:
>> Max
>> 
>> Thanks for the updates, comments are inline:
>> 
>> From:  Max - Micrologiciel <ma...@mi...>
>> Date:  Thursday, 29 January 2015 03:58
>> To:  Rob Vesse <rv...@do...>
>> Subject:  About the SPIN Processor
>> 
>>> Hi Rob,
>>> 
>>> First of all, let me wish you a happy and successful year for 2015.
>> 
>> Thanks and same to you too
>> 
>>> 
>>> I'm still working on the inclusion of the SPIN layer into dotNetRDF.
>>> Since last year's first draft, much of my work has been more experimental
>>> (so not really committable) than formal and most often bound to check
>>> whether and how the different issues I encountered could be handled.
>>> 
>>> So before going further (I've been delaying this too much already...) I
>>> wanted to get your advice on the issues I encountered.
>>> 
>>> Here is a summary of where I stand for now.
>>> 
>>> About SPIN, my first conclusions came to this:
>>> * since SPIN user-defined functions and properties rely mainly on SPARQL, it
>>> should be possible to handle those through SPARQL rewriting.
>> 
>> Yes I think that would be a reasonable approach, the current API may make
>> this harder than it needs to be.  Hopefully the 1.9 changes will make this
>> much easier in the longer term
>> 
>>> * since SPIN allows data-integrity features
>>> (constructors/rules/constraints...) this requires capturing each SPARQL
>>> Update command to perform the SPIN pipeline afterwards.
>>> * 
>>> * since those data-integrity features may signal for violations, the command
>>> results must be cancelled somehow. This implies that there must be some
>>> transactional support in the processor.
>> 
>> Yes, however the SPARQL specs already require that updates within a request
>> (of which their may be many) are applied atomically so any SPARQL processor
>> will already need to support transactions in some sense
>> 
>>> 
>>> Based on the current state-of-the-art, we are faced with the subsequent
>>> issues:
>>> * pipe-lining the SPIN integrity chain requires handling multiple SPARQL
>>> updates/queries in a single transactional context.
>>>> * HTTP being stateless, there is no way (yet ? see
>>>> http://people.apache.org/~sallen/sparql11-transaction/) to span
>>>> transactions over multiple requests
>> Yes this is an issue, some 3rd party stores define their own protocols for
>> transactions e.g. Stardog
>> 
>> If you have a 3rd party store that doesn't support any kind of transactions
>> then the solution may be simply to say that we can't support that.
>>>> * subsequently, supporting transactions locally requires to handle proper
>>>> isolation between clients but also possible transaction concurrency
>>>> problems. 
>> Yep, right now dotNetRDF's in-memory implementation uses MRSW (Multi Reader
>> or Single Writer) concurrency so we avoid concurrency issues by only allowing
>> a single write transaction to be in progress and blocking all reads while
>> transactions are in progress
>>>> * It also requires to simulate the transactional environment on the
>>>> underlying server to alleviate as much as possible the memory consumption
>>>> by dotNetRDF or the storage server.
>> Yes ideally the server should manage the transactions but of course if you
>> are trying to layer SPIN over a server that doesn't support SPIN then some
>> state necessarily has to be maintained by the client.
>> 
>> This perhaps begs the question of how general the SPIN implementation should
>> be and whether it should be limited to a subset of suitable stores.
>> 
>>> * SPIN to SPARQL rewriting also raises some problems due to :
>>>> * how sub-queries are processed according to the recommendation
>>>> * some difficulties to find an equivalent evaluation strategy for some
>>>> forms of propertyPaths.
>> Can you elaborate on what you mean by this?
>> 
>> Is the sub-query stuff related to the use of SPIN functions and templates
>> which potentially require substituting some constants into the sub-query
>> prior to execution?
>> 
>>> 
>>> Going a bit further, I tried experimenting a simple SWP layer on top of the
>>> stack with some success until I deiscover my prototypes was biased by a
>>> Sesame bug on optional subqueries handling. Anyway, I got directly
>>> confronted with how to handle of the natively provided SWP functions which
>>> can not be converted into SPARQL. The problem arises also at the basic SPIN
>>> level if you consider extensions like SPINx, so it may be best and simpler
>>> to handle the case here?
>> 
>> I would start by getting the core working and worry about how to add the
>> extra layers later.  Presumably some of the none SPARQL things could be
>> implemented by using the existing extension function API
>> 
>>> 
>>> Also, I see you are well going on the 1.9 rewriting and since it introduces
>>> many API changes that could also make the implementation easier.
>> 
>> Yes although much slower than I would have liked since I have very little
>> time to work on this these days.  The changes are going to be quite invasive
>> as you've probably noticed but this is necessary to address a lot of the
>> shortcomings in the current API and to make it easier to improve the query
>> engine going forward.
>> 
>> I keep hoping to be able to start putting out some limited alpha releases of
>> the new API at some point this year but then I said that in 2014 and never
>> got far enough to do that.  The new query engine still has some big pieces
>> missing (a query parser and results IO support for a start) before it could
>> be meaningfully used.  Maybe it'll be ready later this year if I can find the
>> time to get it into a sufficiently usable state.
>> 
>> Rob
>> 
>>> 
>>> 
>>> Since you have a much more global view of dotNetRDF and of the RDF/SPARQL
>>> ecosystem than me, your advice would be welcome. If you're available, I'd
>>> rather discuss this with you so we can decide how efforts and contributions
>>> may be best directed.
>>> 
>>> Please, tell me what you think about this.
>>> 
>>> Thanks for your consideration,
>>> Max.
>