Re: [Sparql4j-devel] SPARQL4J : a few first thoughts

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

hello

On Jan 15, 2005, at 10:18 AM, Janne Saarela wrote:

> I very much agree with your goals for this project.

me too - let's start simple SELECT HTTP GET - but design should also 
accommodate future extensions eventually - plug-in idea of Andy is a 
good one as a starting hook for other things.

> The non-RDF applications will find it easy to access SPARQL enabled 
> repositories. The easiness comes via using the familiar programming 
> concepts relating to access relational databases using JDBC. In 
> addition, the easiness comes via the use of on single jdbc driver 
> instead of having to download a separate one for each repository.

I agree - we should start to provide an SQL alike tabular interface to 
RDF result sets, rather than graphs.

>
>> This, together with the scoping of JDBC
>>
>> """ javadoc java.sql (1.5.0)
>
> I would be in favor of targetting 1.4.2 JDK to start with. This is due 
> to our product which ships with 1.4.2 support as we speak with 1.5 
> support coming in the future.
>
> I should check what changes there are in java.sql from 1.4.2 to 1.5.
> Would you remember by heart?

I can not help here too much - I guess Andy knows more about these 
issues - or we can check on Sun specs. Last spec is JDBC-3.0

>
>> means I see SPARQL4J as a JDBC driver that is mainly about issuing 
>> SPARQL
>> SELECT queries, rather than CONSTRUCT or DESCRIBE.  A release with 
>> just
>> SPARQL SELECT, using the plain XML result format, would be very 
>> useful to
>> application writers - one, conventional, interface to RDF published 
>> data.
>> Toolkit independent.
>
> SELECT we start with - let's see how CONSTRUCT and DESCRIBE can be 
> tweaked in the long run.

exactly - the design should allow to accommodate extensions for more 
graph-ical alike queries

Speaking about the Perl DBI world, what we have added some extra 
methods in addition to traditional SQL/relational operations (I would 
call it RDBC for RDF- DataBaseConnectivity)

fetchrow_XML()
	-> fetch next XML chunk using DAWG-xml format
fetchall_XML()
	-> fetch all the XML result-set in one go using DAWG-xml
fetchsubgraph_serialize()
	-> return a serilization (RDF/XML, N-Triples or other) of the next 
subgraph resulting from
               the query (either SELECT, DESCRIBE, CONSTRUCT)
fetchallgraph_serialize()
	-> return a serilization (RDF/XML, N-Triples or other) of the whole 
subgraph resulting from
               the query (merge of all subgraph of previous method)
fetchsubgraph()
	-> return  the next subgraph resulting from the query (e.g. GraphModel)
fetchallgraph()
	-> return the whole subgraph resulting from the query

Then the Perl API has explicitly a method called func() to call an 
extension function/method - I guess we will be able to tweak something 
similar in the JDBC, perhaps extending/sub-classing.

>
>> We could be merely inspired by JDBC but actually produce a new
>> interface that is more SPARQL suitable.  I'd like to avoid this for 
>> now
>> and try to implement "pure" JDBC.
>
> My goal is the very same - let's know get into extensions right away.

incremental - let's expose canonical (old) RDQL (SELECT only) 
functionality through JDBC - then in the meantime start with the rest 
of the features

>
>> I'm assuming that the connection to the database (the RDF store, the
>> knowledge base) is HTTP.  Now I would like to be able to take the
>> SPARQL4J codebase and plug-in an adapter, instead of HTTP, to get a 
>> JDBC
>> driver for ARQ [0] directly for local use.  We need a plug-in layer 
>> for that
>> which is a connection layer with SPARQL-centric mechanisms and then 
>> have
>> common code for the presentation of results as JDBC methods.
>
> Ok, I see. Internally we can create a factory that gives the protocol 
> part implementation for the other parts of the driver. From the user 
> perspective this protocol should perhaps be visible on the datasource 
> string? What do you think? Let's start a separate thread on the 
> datasource.

it might be JDBC database metadata (catalog) methods will help in there 
to bootstrap and negotiate the protocol part - or carefully define an 
initial list of possible protocols in addition to HTTP - for sure a 
local/hijacked one will be needed.

>
>> Update - out of scope: Until there is a standard (de facto or de jure)
>> language, servers won't implement a common way to do RDF update - 
>> there are
>> several major decisions, like handling bNodes, to be settled first.
>
> Agreed - out of scope for now.

+1

>
>> A quick review of the JDBC interfaces shows few tricky parts:
>>
>> 1/ NULLs.  SQL has NULLs; SPARQL has unbound variables. That in itself
>> is not important - NULL would be just a way of saying "not bound".  
>> But
>> the getXXX methods must return a value of the given type and getInt 
>> returns
>> 0 on NULL, which isn't very distinguishing value.  But in RDF, there 
>> isn't
>> always a value in the result set for a given row - NULLs become more 
>> common
>> and the "return zero" solution is a bit weak.
>>
>> Solutions?: Default values that are more unusual values (e.g. 
>> MIN_VALUE+2)
>> or ones that can be set by the app (requires an extension to JDBC).
>
> I would be happy with a default value. Let's see what the eventual 
> other users think.

again - we might need to have a better look at the JDBC 
metadata/catalog layer to see what it offers to negotiate this if 
possible

>
>> 2/ Metadata.  Each solution to a SPARQL query can have different types
>> of values for the same property.  Returning anything meaningful as
>> "metadata" will need design work.  Solution?: For now, return very
>> little and see if applications use the metadata information much.
>
> Let's make a, if not dummy, a very basic implementation of the 
> resultset metadata and listen to use cases that would require a more 
> elaborate implementation.

agree - let's try to get basic conjunctive query to work (even no 
optional if too hard) - then move on to next step

>
>> 3/ Error conditions: The JDBC interface assumes a conventional 
>> connection to
>> a local database.  SPARQL is a web protocol and many error conditions 
>> matter
>> - the difference between soft errors like can't contact and hard 
>> errors like
>> invalid query make more of a difference to a web app.
>
> I think the different errors could be modelled as a hierarchy of 
> Exceptions subclassed from SQLExeption. This would enable client code 
> determine different flavours of errors without having to do string 
> parsing from a SQLException to see what happened.

agree also here - but if we have HTTP protocol and XML results, we 
might have status/errors also encoded into data - or we want to avoid 
that for JDBC?

It would be interesting to look how for example XQuery or other XML-DB 
people have done similar things over JDBC

Alberto