Re: [dotNetRDF-Develop] Integrating SPIN into dotnetrdf

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Sorry I've been out the discussion but this week is really busy for me.

Comments inline:

On 3/19/13 10:35 AM, "Tomasz Pluskiewicz" <tom...@gm...>
wrote:

>Comments inline
>
>On Sun, Mar 17, 2013 at 4:56 AM, Kevin <ke...@th...> wrote:
>> Rob, Tom
>>
>>
>>
>> After more reading I believe a dotnetrdf/SPIN/Fluent SPARQL would be an
>> excellent solution.  In fact I would think people would convert to C#
>> dotnetrdf due to this compiling story.
>>
>>
>>
>> Strategy Questions:
>>
>>
>>
>> Due to my knowledge, time constraints, and past experience I know if I
>> attempt SPIN dotnetrdf alone it will drag out and never complete.  Can
>>you
>> let me know if any of the strategies below make any sense.  Both
>>strategies
>> involve splitting the work with the understanding that we are all busy
>>and
>> there is no real obligation.  Also I assume any knowledgeable new comer
>>is
>> welcomed to contribute to the cause.  I believe the effort will
>>obviously
>> benefit the community, but also provide a deep insight to the
>>contributing
>> developers.
>>
>>
>>
>> -Convert the Java TopBraid SPIN API (Uses Jena interface) into C#
>>starting
>> with a conversion tool like Tangible Java->C# (I will purchase).  The
>>tool
>> will help with most of the syntax, but there will be a lot of manual
>>work.
>> The conversion will obviously entail replacing the Jena interface with
>> dotnetrdf.  The benefits of this strategy is we harness the
>>completeness,
>> robustness, and future updates (re-port changes) of TopBraid SPIN API.
>>The
>> conversion should provide insights into the implementation enabling
>>custom
>> tweaks.  It should also be possible to split the converted Java files
>>into 3
>> groups if you guys are up for the task.  I am in preference of this
>>solution
>> primarily because we are essential creating a SPIN inference engine
>>based
>> upon code from the people who invented SPIN.
>>
>>
>
>Has TopBraid opened all of their SPIN API? Last time I looked only
>parts were open source. I wonder that it got no attention at
>semanticweb.com. Or did it?
>
>Regarding the first strategy I undersnatd it is very appealing but I'm
>afraid converting API calls from Jena to dotNetRDF could be more work
>than you expect. I don't know Jena very well but I expect that a lot
>of the functionality isn't just a simple 1:1 relation. In this case we
>are talking about a number of areas:
>
>- Graph traversal for reading SPIN queries in RDF form
>- SPARQL query and it's programmatic counterpart
>(VDS.RDF.Query.SparqlQuery and others)
>- Query execution
>- Reasoner

I would echo what Tom said, write code from scratch don't try and port the
existing Java.  As someone who also works on Jena I can confirm what Tom
said that the APIs around queries and different enough that you would
spend far more time extricating Jena from the converted code than if you
just wrote the relevant code yourself.

Also converting code opened source under a different license than
dotNetRDF uses provides a somewhat grey area about whether you can
re-license the code since we would not be the original copyright holders
and would likely shoot ourselves in the foot wrt making our SPIN
implementation usable in a corporate setting (where they are going to care
about legal stuff like this) if at all.

>
>However what we should definitely try to preserve are any unit tests
>and possibly some kind of test hareness. I hope the guys at TopBraid
>have SPIN thoroughly tested and we should reuse those tests on our
>implementation.

This is a good point, if they have open sourced the tests these would be
useful references to write our own tests from.

>
>>
>> -Follow your initial start at SPIN dotnetrdf and maybe use TopBraid
>>SPIN API
>> as a reference (Difficult because approaches will diverge).  Rob are
>>you a
>> take any small essential blocks and coordinate the overall effort?  Tom
>>are
>> you able to help out, especially where Fluent SPARQL is utilized?
>>Again I
>> think the alternate strategy above has a lot of merit, but it must align
>> with Rob's vision.   No matter the strategy please look at the
>> implementation questions below to help clarify the high/low level
>>picture.
>>
>
>Now, if we had not only the Java API reference but also complete
>automatic test suite from the start I would opt for the second
>approach.
>
>If not available or somehow unusable for a .NET project I think we
>could resort to http://spinservices.org/spinrdfconverter.html for
>validating our efforts.

I think the early tests I wrote did use this as well, I likely need to
drag those over from the old branch as well.

>
>Whatever the case I think that starting from scratch rather than
>porting Java is likely a better approach in the long run.

As I said above +1

>
>>
>>
>> Implementation Questions:
>>
>>
>>
>> -Can you provide a high level overview of a SPIN dotnetrdf
>>implementation so
>> I can ensure the final solution is acceptable.  Let's say you have the
>>SPIN
>> rule for "adult rdfs:subclassof person" at the top-level (ie.
>>owl:Thing) and
>> the user queries is Elvis a Adult.   Would dotnetrdf  SPIN spawn
>>numerous
>> intermediate SPARQL queries to gather the important SPIN rules that
>>must be
>> executed?  If RDF database is remote (ie. dbpedia) could this
>>accumulative
>> delay become unacceptably (Over 3 seconds in my application).
>>Alternatively
>> is everything required by the user query obtained in 1 or 2 complex
>> intermediate queries which dotnetrdf SPIN creates and digests?
>>Basically if
>> you could paint the high level overview with an example I could then
>>move to
>> the details.
>>
>>
>
>I think that a complete dotNetRDF implementation of SPIN would require
>at least two components.
>
>1. A converter from SPIN/RDF to in-memory queries (not necessarily
>VDS.RDF.Query.SparqlQuery - see below)
>2. A SPIN runner, which will take a SPIN query and execute it in an
>instance's context
>3. A reasoner, which uses the above converter and executes the SPIN
>rules and constraints againt a graph or triple store

Yes that is pretty much what I was going to suggest

>
>>
>>
>> -Can you provide a low-level overview of your SPIN dotnetrdf
>>implementation
>> strategy.  I only partially understood you explanation below, which I
>>know
>> will be a little clearer after looking at the code.  What is
>> spin-sparql-syntax.ttl and how does it fit into the puzzle?  What does
>>it
>> mean to convert a query into SPIN RDF representation?  My naive picture
>>is
>> that a user query is converted into internal query(s) to obtain SPIN RDF
>> rules, which are then executed by the new SPIN dotnetrdf engine.  Lastly
>> please explain a little more your view of turning a SPIN query into a
>>query?
>>
>>
>
>Basically I think that a complete solution should work like this:
>1. Prepare a metadata (ontology) graph with your SPIN rules/constraints
>2. SPIN rules are converted to SPARQL queries
>3. Queries are executed against graph/store to produce constraint
>violations and inferences/assertions

I agree, for point 3 we could even go so far to have a SPIN constraint
validator as a decorator over a dataset which would fire off the
constraints in the Flush() method (I.e. the transaction commit) and fail
the transaction if constraints have been violated.  The same goes for
running inferences.

>
>If you look at the TopBraid videos and SPIN Modelling vocabulary you
>will notice that there is much more to SPIN than just converting
>queries from RDF representation to SPARQL:
>- rule modularization templates
>- constructors
>- extending SPARQL Engine with functions and magic properties
>
>I agree with Rob that first we would need a converter from SPIN RDF to
>SPARQL but its design should be influenced by how we will use it and
>we already know very good what that would be. Unfortunately a lot
>seems unclear to me when I read the SPIN Modelling documentation. For
>one thing there can be many rules and as Kevin wrote, executing many
>such rules against a store may not be a good idea.

I think the immediate goal would be to make SPIN runnable against
in-memory data, a longer term goal can be to make this run against
arbitrary SPARQL endpoints.

Rob

>
>Kevin, what do you mean by you last question? You have RDF triples,
>which represent a query and you traverse the graph to create a query
>(string). The IGraph and Fluent SPARQL APIs combined should make this
>work easy.
>
>Hope this helps, for now ;)
>Tom
>
>>
>>
>> Thanks,
>>
>> Kevin
>>
>>
>>
>> From: Rob Vesse [mailto:rv...@do...]
>> Sent: Friday, March 15, 2013 2:57 PM
>> To: Kevin
>> Cc: dotNetRDF Developer Discussion and Feature Request
>> Subject: Re: Integrating SPIN into dotnetrdf
>>
>>
>>
>> Hey Kevin
>>
>>
>>
>> Discussion inline:
>>
>>
>>
>> From: Kevin <ke...@th...>
>> Date: Wednesday, March 13, 2013 7:40 PM
>> To: Rob Vesse <rv...@do...>
>> Subject: Integrating SPIN into dotnetrdf
>>
>>
>>
>> Rob,
>>
>>
>>
>> First thank you for your quality work you have done with the dotnetrdf
>> project.  I have seen a few different posts about your initiative to
>> integrate SPIN into dotnetrdf (ie.  SPIN Post).  After much reading on
>>the
>> subject it really seems that SPIN would really propel/complement
>>dotnetrdf.
>> I believe SPIN not only makes up for the missing OWL inference (Via SPIN
>> OWL-RL implementation), it also can expand to suit the modelers
>>imagination.
>> The fact that the rules are in SPARQL makes for an unbeatable solution.
>> Should it matter my current effort involves query a Virtuoso database
>>(Some
>> owl support) with dotnetrdf.  I would really appreciate you taking a
>>look at
>> the questions below:
>>
>>
>>
>> -Have you made any further progress on integrating SPIN into dotnetrdf?
>> Would you allow me to have the source code in its current state?  Could
>>I
>> possibly be a contributor on this cause as I am not really equipped for
>>the
>> full task?  In any case I would appreciate any source code which I
>>could use
>> as a learning tool.
>>
>>
>>
>> No I haven't had time to do anything on SPIN for a long time now.  I've
>>been
>> primarily concentrating on getting core features stabilized such as the
>> SPARQL engine which are obviously fairly key to building stuff like
>>SPIN on
>> top.
>>
>>
>>
>> However I still don't have time to work on SPIN directly so if you want
>>to
>> work on this please feel free, find the code in the mercurial
>>repository at
>> https://bitbucket.org/dotnetrdf/dotnetrdf
>>
>>
>>
>> The previous and very minimal SPIN stub is under Libraries\Query\Spin,
>> create your own fork and then you can send pull requests as and when you
>> have something to
>>
>>
>>
>> The key things that need to be done to get the core of SPIN implemented
>>are
>> as follows:
>>
>> Update the current spin-sparql-syntax.ttl to a current version, it
>>likely
>> doesn't represent the current version of the spec (this is primarily a
>> convenience reference for developers)
>> Finish the existing stubs for converting queries into their SPIN RDF
>> representation (see SpinSyntax.cs)
>> Write code to turn a RDF encoding of a SPIN query into a query
>>
>> The middle one would be the easiest to start with since there is already
>> some partial stubs to get you started.
>>
>>
>>
>>
>>
>> -From the available TopQuadrant documentation I have tried to deduce how
>> dotnetrdf might implement SPIN.  According to SPIN tutrial, TopBraid
>>finds
>> all SPIN inferecer rules and runs them when you hit play.  Would
>>dotnetrdf
>> SPIN inferencer only run the rules that are associated with the class
>> structure being queried?  Basically I am confused how dotnetrdf decides
>> when/how/which SPIN rules to run for a given query.
>>
>>
>>
>> That's an implementation detail, we would control how and when rules get
>> run.  We need to get the basic implementation of SPIN done first before
>>this
>> aspect of things gets implemented anyway.
>>
>>
>>
>>
>>
>> -How much of SPIN could dotnetrdf possibly support.? It appears SPIN
>> contains Inference Rules, Constraint Checking, and ability to Isolate
>>rules
>> for certain conditions.  Also the TopBraid tool seems to have
>>"User-Defined
>> SPARQL functions" and "SPIN Query Templates".  I imagine dotnetrdf would
>> have to keep up with any SPIN improvements.
>>
>>
>>
>> All of those are supportable in some shape of form, until we have the
>>core
>> of SPIN up and running we can't really implement those.  Most of those
>> features run on top of the SPIN core and so will ultimately just be
>> implementation details once we have a core to build upon.  User defined
>> SPARQL functions are basically just SPARQL queries that return a single
>> value and query templates are just parameterized queries both of which
>>the
>> existing SPARQL engine is capable of supporting in one way or another.
>>So
>> it is just a case of exposing that functionality in the SPIN style.
>>
>>
>>
>> Hope this is enough to get you started, if not please let us know,
>>
>>
>>
>> Rob
>>
>>
>>
>>
>>
>> Regards,
>>
>> Kevin
>>
>>
>> 
>>-------------------------------------------------------------------------
>>-----
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_d2d_mar
>> _______________________________________________
>> dotNetRDF-develop mailing list
>> dot...@li...
>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop
>>
>
>--------------------------------------------------------------------------
>----
>Everyone hates slow websites. So do we.
>Make your web apps faster with AppDynamics
>Download AppDynamics Lite for free today:
>http://p.sf.net/sfu/appdyn_d2d_mar
>_______________________________________________
>dotNetRDF-develop mailing list
>dot...@li...
>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop