Re: [Sparql4j-devel] RE: XML parsing

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>>*) triple-per-row:
>>
>>// JDBC-specific access exposed via ResultSetMetaData
>>String subject = rs.getString("subject");
>>int subjectType = rs.getInt("subject$type"); // URI | BLANK_NODE
>>String predicate = ns.getString("predicate");
>>String object = rs.getString("object");
>>int objectType = rs.getInt("object$type"); // URI | BLANK_NODE |
>>PLAIN_LITERAL | TYPED_LITERAL
>>String lang = rs.getString("object$lang");
>>String datatype = rs.getString("object$datatype");
>>    
>>
>
>That looks just like like
>
>SELECT ?subject ?predicate ?object { ... pattern ... }
>  
>
>+ the accessors for type and lang
>  
>

That's the point: that it *looks and behaves like* all other Sparql 
queries :-) However, it's strictly not the same (not same columns) and 
with a pattern containing more than one triple matcher, the result is 
not even close. The type, lang and datatype accessors are needed also in 
SELECT queries.

>No, not at all.  Entailment is defined on models.
>
>Open world says there may be other statements out there in other models.
>
>An application displaying the results of a query wants to know when it
>has seen all the results for its query on its choosen dada source. e.g.
>listing people by name makes an ordering assumption.  When you see the
>"K"'s, there are no more "J"'s in this result.
>  
>

You couldn't use CONSTRUCT for that anyway.

>>So what's the point of sparql4j then?
>>    
>>
>
>SELECT queries for application that wish to access RDF information.
>For example, a RDF repository as the core of a 3-tier web site.  No need
>for the business logic to have an RDF toolkit if it just wants to get
>people's name.
>
>If an application is going to do RDF processing on graphs, it would not
>want to use JDBC's view of row-by-row.  Why not just get the constructed
>graph in whatever way its toolkit wants?  Because there is a paradigm
>shift, the value of JDBC to moving graphs around seems limited to me.
>
>But SELECT queries, to get information out of RDF graphs, are the most
>important kind of query.  Feed this into JDBC environments in the JDBC
>pardigm and we may even be able to reuse JDBC client tools.
>  
>

So in essence sparql4j should support only SELECT (and ASK?) queries?

>>Firstly there's a difference in building application specific (domain)
>>model from a stream (of triples) or building first a generic RDF model
>>and only then building the actual target model.
>>
>>Secondly, even though in general case the order of triples isn't
>>guaranteed, it's quite common to group the statements by the subject.
>>    
>>
>
>I disagree - you're relying on the server having processed the entire
>model to make the statements come out nicely.  Jena uses hashes all over
>the place - things do not come out in any sort of order, nor is it
>consistent.
>  
>

I know how Jena works and I make no assumptions about the nature of 
given toolkit or sparql engine. Triples may come out in random order and 
this is something the user addressing CONSTRUCT queries has to take into 
account.

The aspect (that triples don't come out in any particular order) is 
build into RDF and has to be taken into account in practically every 
RDF-based program. I dont see this as a problem. It's just something to 
be aware of.

>>There also isn't a way of requiring that a certain type of resources
>>should be URI's...
>>
>>I used to build (when using Jena) RDQL queries prorammatically using
>>    
>>
>the
>  
>
>>query API directly since that way one can (or at least could) also use
>>the bnodes in queries, thus avoiding failing queries in case bnodes
>>    
>>
>were
>  
>
>>used. Of course I was told that I shouldn't do this and to use
>>models/resources accessors instead... however when using RDB model
>>    
>>
>with
>  
>
>>fast path, I was able to achieve magnitudes of better performance this
>>way. If I recall right I even used (at least at some point) the
>>ResultBinding#getTriples() to process the results.
>>    
>>
>
>But it would not have worked with Joseki :-) bNodes not preserved across
>the network.
>  
>

So, I coudn't have used Joseki then ;-)

bNodes are used a lot in e.g. OWL ontologies and for an ontology browser 
addressing {this rdfs:subClassOf ?superclass} running into a bNode (e.g. 
Restriction) needs to be able to get more information about it.

>You might like to see the ARQ configuration options I just put in so
>bnode ids are passed transparently across the network.  It's not the
>default mode (makes the XML results look ugly).
>  
>

RDF/XML itself looks ugly and is not ment for human consumption anyway ;-)

>>>That is XSLT on the XML results?
>>>
>>>
>>>      
>>>
>>Yes.
>>    
>>
>
>So it's a SELECT query.
>  
>

No. One may use XSLT also with RDF/XML. It might not be very convenient, 
but hardly impossible and most importantly a fully valid use case. I 
actually would have preferred RDF/XML -based format for SELECT queries 
too - preferably a canonical RDF/XML format - to be able to process the 
results reliably with an RDF toolkit. Having canonical RDF/XM format for 
CONSTRUCT queries would certainly help XSLT use cases also...

>>All the more reason why we should provide also RDF-parsing :-)
>>Succeeding to do this in a toolkit independent way might also be
>>    
>>
>usable
>  
>
>>to anyone building toolkits...
>>    
>>
>
>Then the objective of the project is now providing another API to RDF
>(wrapping all toolkits is just like a new toolkit that uses others as
>its implementation.  I have a toolkit, and a SPARQL-protocol interface
>that (I think) is easiler to work with than warping to the JDBC paradign
>and and waroping back again - it's cognitive bruden on the app writer.
>I wanted an approach of doing something clearcut, minimal and distinct
>with a clear value.  But this now seems to be growing into a general
>purpose RDF framework.  
>

No. Just a meaningfull JDBC binding for all types of valid Sparql 
queries. Coding reusable components is a good practice as such.

>Can we find some limits please?
>  
>

Should this (triple-per-row) be desired approach, it can be scheduled 
for later and throw "not implemented yet, use Statement.executeSparql() 
instead" -exception for now.

We both seem to have pretty firm opininon about how to handle CONSTRUCT 
queries. Since these to approaches are complementary I'd suggest voting 
on it. Use cases might also help in making the decision, but I guess 
we'd just end up arguing about the relevance of them ;-)

Br
Samppa

-- 
Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com