sparql4j-devel Mailing List for sparql4j

Status: Pre-Alpha

Brought to you by: jsaarela

sparql4j-devel — Developers' discussion

You can subscribe to this list here.

2005	_Jan (11)	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct (4)	_Nov (11)	_Dec (7)
2006	_Jan (13)	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

1 2 > >> (Page 1 of 2)

RE: [Sparql4j-devel] M1 reached?

From: Seaborne, A. <and...@hp...> - 2006-01-16 11:53:25


-------- Original Message --------
> From: Samppa Saarela <mailto:sam...@pr...>
> Date: 13 January 2006 15:18
>=20
> > > So you suggest allowing $'s also as parameter slots? I guess/hope
> > > that adding support for this is trivial...
> > >=20
> > >=20
> >=20
> > The reverse is what I had in mind.  Writing the query with $subject.
> >=20
> >=20
>=20
> It might cause confusion about what is parameter slot and what is
> variable - even I'm not quite sure if you mean a parameter slot or
> variable by that $subject? =20


select * where {$subject rdfs:label ?;=20
                         rdfs:comment $comment .=20
                filter ($comment < ?) .}


>=20
> The format for parameter slots should be distinct from variables,
> otherwise e.g. variables used in a select-clause might get confuced as
> parameter slots and become replaced... =20
>=20
> > Named paramters are much better than pure postional!
> >=20
> >=20
> Absolutely! ...but there's no support for name parameters in JDBC
which
> is our primary objective. If named parameters are used, the format
used
> for *should* be distinct from variables (got better ideas than
> ?{paramName} syntax?) because JDBC offers primarily (i.e. only)
> positional parameters. However, we have been discussing (with Timo)
> about JDBC extension allowing one to bind parameter slots into
> variables, e.g.     =20
>=20
> select * where {? foo:predicate ?object}
>=20
> stmt.setVariable(1, "subject");
>=20
> --> select * where {?subject foo:predicate ?object}
>=20
>=20
> Have a nice weekend!
>=20
> Br,
>   Samppa

[Sparql4j-devel] M1 reached?

From: Samppa S. <sam...@pr...> - 2006-01-12 15:13:33

Hi all,

I just committed SparqlPreparedStatement implementation, it uses lone
questionmarks as parameter placeholders, e.g.

select * where {?subject rdfs:label ?; rdfs:comment ?comment . filter
?comment < ? .}

has 2 parameter slots.

It uses quite naive parsing that differs from the spec at least in IRI
handling, by disallowing IRIs starting with a digit, ? or " to avoid
mixing operators with IRIs. I don't see this as a grave limitation.

It implements most of the PreparedStatement.set*(int, *) methods.
Date/time handling is one to beaware of: for simplicity I
process/normalize all time-related values using UTC-time zone. It would
probably be better to use the provided Calendar or JVM default to
process the time zones but this is a bit more complex to implement since
XSD time (zone) format is not directly supported by the
SimpleDateformat. Any opinions about this: should setDate/Time/Timestamp
retain the given or implicit time zone?

What more is needed for M1? System/integration tests? ResultSet handler
for RDF/XML?

Tests include one system/integration test that uses server URL defined
in test/conf/connection.properties (note: test/conf needs to be in
classpath). It just addresses (?s ?p ?o) -query, checks headers and
scrolls through the results.

We should discuss about how to actually implement the system/integration
tests. How to ensure that the server contains some particular data set
to query over and validate the results?

Br
Samppa

Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com

[Sparql4j-devel] Project structure

From: Samppa S. <sam...@pr...> - 2006-01-10 13:01:10

Hi all,

We did some major refactorings to the project with Timo for clearer CVS 
usage and namings, e.g. files like logs and reports are generated into 
test_build instead of the same directory (i.e. test) with source files 
AND JDBC specific implementation classes are now prefixed with Sparql 
(e.g. SparqlStatement).

Br,
  Samppa

-- 
Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com

[Sparql4j-devel] About logging

From: Samppa S. <sam...@pr...> - 2006-01-10 07:53:44

Hi

As Andy pointed out, using loggin facade for logging is a better 
alternative than log4j (or any other actual implementation) since that 
way the loggin system adapts to clients environment. Due to known 
problems with commons logging, I'd like to try SLF4J instead. See

http://www.qos.ch/logging/classloader.jsp

http://www.slf4j.org/

Opinions?

Br,
  Samppa

-- 
Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com

Re: [Sparql4j-devel] RE: XML parsing

From: Samppa S. <sam...@pr...> - 2006-01-09 09:57:04

>>*) triple-per-row:
>>
>>// JDBC-specific access exposed via ResultSetMetaData
>>String subject = rs.getString("subject");
>>int subjectType = rs.getInt("subject$type"); // URI | BLANK_NODE
>>String predicate = ns.getString("predicate");
>>String object = rs.getString("object");
>>int objectType = rs.getInt("object$type"); // URI | BLANK_NODE |
>>PLAIN_LITERAL | TYPED_LITERAL
>>String lang = rs.getString("object$lang");
>>String datatype = rs.getString("object$datatype");
>>    
>>
>
>That looks just like like
>
>SELECT ?subject ?predicate ?object { ... pattern ... }
>  
>
>+ the accessors for type and lang
>  
>

That's the point: that it *looks and behaves like* all other Sparql 
queries :-) However, it's strictly not the same (not same columns) and 
with a pattern containing more than one triple matcher, the result is 
not even close. The type, lang and datatype accessors are needed also in 
SELECT queries.

>No, not at all.  Entailment is defined on models.
>
>Open world says there may be other statements out there in other models.
>
>An application displaying the results of a query wants to know when it
>has seen all the results for its query on its choosen dada source. e.g.
>listing people by name makes an ordering assumption.  When you see the
>"K"'s, there are no more "J"'s in this result.
>  
>

You couldn't use CONSTRUCT for that anyway.

>>So what's the point of sparql4j then?
>>    
>>
>
>SELECT queries for application that wish to access RDF information.
>For example, a RDF repository as the core of a 3-tier web site.  No need
>for the business logic to have an RDF toolkit if it just wants to get
>people's name.
>
>If an application is going to do RDF processing on graphs, it would not
>want to use JDBC's view of row-by-row.  Why not just get the constructed
>graph in whatever way its toolkit wants?  Because there is a paradigm
>shift, the value of JDBC to moving graphs around seems limited to me.
>
>But SELECT queries, to get information out of RDF graphs, are the most
>important kind of query.  Feed this into JDBC environments in the JDBC
>pardigm and we may even be able to reuse JDBC client tools.
>  
>

So in essence sparql4j should support only SELECT (and ASK?) queries?


>>Firstly there's a difference in building application specific (domain)
>>model from a stream (of triples) or building first a generic RDF model
>>and only then building the actual target model.
>>
>>Secondly, even though in general case the order of triples isn't
>>guaranteed, it's quite common to group the statements by the subject.
>>    
>>
>
>I disagree - you're relying on the server having processed the entire
>model to make the statements come out nicely.  Jena uses hashes all over
>the place - things do not come out in any sort of order, nor is it
>consistent.
>  
>

I know how Jena works and I make no assumptions about the nature of 
given toolkit or sparql engine. Triples may come out in random order and 
this is something the user addressing CONSTRUCT queries has to take into 
account.

The aspect (that triples don't come out in any particular order) is 
build into RDF and has to be taken into account in practically every 
RDF-based program. I dont see this as a problem. It's just something to 
be aware of.

>>There also isn't a way of requiring that a certain type of resources
>>should be URI's...
>>
>>I used to build (when using Jena) RDQL queries prorammatically using
>>    
>>
>the
>  
>
>>query API directly since that way one can (or at least could) also use
>>the bnodes in queries, thus avoiding failing queries in case bnodes
>>    
>>
>were
>  
>
>>used. Of course I was told that I shouldn't do this and to use
>>models/resources accessors instead... however when using RDB model
>>    
>>
>with
>  
>
>>fast path, I was able to achieve magnitudes of better performance this
>>way. If I recall right I even used (at least at some point) the
>>ResultBinding#getTriples() to process the results.
>>    
>>
>
>But it would not have worked with Joseki :-) bNodes not preserved across
>the network.
>  
>

So, I coudn't have used Joseki then ;-)

bNodes are used a lot in e.g. OWL ontologies and for an ontology browser 
addressing {this rdfs:subClassOf ?superclass} running into a bNode (e.g. 
Restriction) needs to be able to get more information about it.

>You might like to see the ARQ configuration options I just put in so
>bnode ids are passed transparently across the network.  It's not the
>default mode (makes the XML results look ugly).
>  
>

RDF/XML itself looks ugly and is not ment for human consumption anyway ;-)

>>>That is XSLT on the XML results?
>>>
>>>
>>>      
>>>
>>Yes.
>>    
>>
>
>So it's a SELECT query.
>  
>

No. One may use XSLT also with RDF/XML. It might not be very convenient, 
but hardly impossible and most importantly a fully valid use case. I 
actually would have preferred RDF/XML -based format for SELECT queries 
too - preferably a canonical RDF/XML format - to be able to process the 
results reliably with an RDF toolkit. Having canonical RDF/XM format for 
CONSTRUCT queries would certainly help XSLT use cases also...

>>All the more reason why we should provide also RDF-parsing :-)
>>Succeeding to do this in a toolkit independent way might also be
>>    
>>
>usable
>  
>
>>to anyone building toolkits...
>>    
>>
>
>Then the objective of the project is now providing another API to RDF
>(wrapping all toolkits is just like a new toolkit that uses others as
>its implementation.  I have a toolkit, and a SPARQL-protocol interface
>that (I think) is easiler to work with than warping to the JDBC paradign
>and and waroping back again - it's cognitive bruden on the app writer.
>I wanted an approach of doing something clearcut, minimal and distinct
>with a clear value.  But this now seems to be growing into a general
>purpose RDF framework.  
>

No. Just a meaningfull JDBC binding for all types of valid Sparql 
queries. Coding reusable components is a good practice as such.

>Can we find some limits please?
>  
>

Should this (triple-per-row) be desired approach, it can be scheduled 
for later and throw "not implemented yet, use Statement.executeSparql() 
instead" -exception for now.

We both seem to have pretty firm opininon about how to handle CONSTRUCT 
queries. Since these to approaches are complementary I'd suggest voting 
on it. Use cases might also help in making the decision, but I guess 
we'd just end up arguing about the relevance of them ;-)

Br
Samppa

-- 
Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com

RE: [Sparql4j-devel] RE: XML parsing

From: Seaborne, A. <and...@hp...> - 2006-01-06 16:34:05


-------- Original Message --------
> From: Samppa Saarela <>
> Date: 5 January 2006 09:50
>=20
. . . .
> > An important value of SPARQl4j (for me) is that an application,
which
> > is=20
> > not very RDF aware, to be able to get information out of remote RDF
> > repository by SPARQL query.
> >=20
> >=20
> However, relying on external RDF API (or InputStream) to handle
certain
> kind of queries makes it highly RDF dependent and thus requires
> ultimatly that the user is not only aware of RDF but also aware of the
> (configuration dependent) RDF toolkit.
>=20
> The design I'm proposing aimes
>=20
> 1) at providing as-natural-as-it-gets jdbc (row/column -based)
behaviour
> in all cases
>=20
> 2) not blocking more RDF/Sparql aware use cases / applications and
>=20
> 3) making the decision between the desired approach explicit (i.e. an
> application that is aware of the result forms may access the results
> directly by being aware of the Sparql-specific jdbc api extensions),
>=20
> 4) providing the user all the necessary information and control (e.g.
> ability to define accept preferences when executing a query and access
> to the actual content type of the result) needed to process the
results
> directly,
>=20
> 5) providing factory-based / RDF toolkit dependent getObject()
accessors
> (only) as convenience accessors for RDF aware application
>=20
> > JDBC is not a good match to getting RDF graphs (as we are finding!)
> > and=20
> > choosing one processing model (streams of something) over another
> > makes=20
> > assumptions about the nature and approach of the toolkit.
> >=20
> >=20
> That's why I'd try to avoid any direct/required dependency to the
> factory, providing it only (and possibly optionally) for convenience
to
> more RDF aware users. In my opinion the triple-per-row* is
> as-good-as-it-gets alternative for row based (i.e. jdbc style)
handling
> of RDF. It actually resembles one of RDF's serialization forms, namely
> NTRIPLES. Also the W3C's RDF Validator provides tabular form of the
> parsed graph, which I have found quite usefull.
>=20
> *) triple-per-row:
>=20
> // JDBC-specific access exposed via ResultSetMetaData
> String subject =3D rs.getString("subject");
> int subjectType =3D rs.getInt("subject$type"); // URI | BLANK_NODE
> String predicate =3D ns.getString("predicate");
> String object =3D rs.getString("object");
> int objectType =3D rs.getInt("object$type"); // URI | BLANK_NODE |
> PLAIN_LITERAL | TYPED_LITERAL
> String lang =3D rs.getString("object$lang");
> String datatype =3D rs.getString("object$datatype");

That looks just like like

SELECT ?subject ?predicate ?object { ... pattern ... }

+ the accessors for type and lang

>=20
> // RDF toolkit specific convenience/hidden access
> Resource subject =3D (Resource)rs.getObject("subject");
> Predicate predicate =3D (Predicate)rs.getObject("predicate");
> RDFNode object =3D (RDFNode)rs.getObject("object");
>=20
> Note that the JDBC-specific access can easily be used to provide a
> configuration (i.e. factory) independent access to any RDF toolkit
> specific objects. It's also by far more robust way of accessing the
> toolkit specific resources than the factory approach which would end
up
> in ClassCastExceptions if configuration changed. Actually the factory
> approach could aesily be replaced with simple (and robust) ResultSet
> wrappers / handlers.
>=20
> > If the application wants to do listStatements AND wants triples in
the
> > local toolkit format, then the value of a JDBC interface (which is
> > row-oriented) is pretty slim.  So I do not see a high value for
> > SPARQl4j=20
> > as a general connection over the SPARQL protocol in the initial
> > releases.
> >=20
> >=20
> Triple-per-row on the other hand offers even something usefull for a
not
> so RDF/Sparql/toolkit aware application.
>=20
> The value of providing only sparql result form parsing is also pretty
> slim - I have actually already implemented it (just not yet committed
it
> into CVS). As parsing RDF/XML (and/or N3) is much more difficult,
> providing just that in a toolkit independent way would actually
provide
> extra value.
>=20
> BTW Isn't it a bit contrary to the open-world view of Semantic Web,
when
> you argued in your previous email that a model isn't usable unless all
> of it's statements are known?

No, not at all.  Entailment is defined on models.

Open world says there may be other statements out there in other models.

An application displaying the results of a query wants to know when it
has seen all the results for its query on its choosen dada source. e.g.
listing people by name makes an ordering assumption.  When you see the
"K"'s, there are no more "J"'s in this result.

>=20
> > It would seem likely that every RDF toolkit will have a built-in
> > SPARQL=20
> > client so if the application is doing RDF processing, it is much
> > better=20
> > to use that that trying to fit around the JDBC row-oritneted
paradigm.
> > It's pretty easy to write (that part ARQ is quite small - some
rather
> > tedious HTTP connection bashing).
> >=20
> >=20
> So what's the point of sparql4j then?

SELECT queries for application that wish to access RDF information.
For example, a RDF repository as the core of a 3-tier web site.  No need
for the business logic to have an RDF toolkit if it just wants to get
people's name.

If an application is going to do RDF processing on graphs, it would not
want to use JDBC's view of row-by-row.  Why not just get the constructed
graph in whatever way its toolkit wants?  Because there is a paradigm
shift, the value of JDBC to moving graphs around seems limited to me.

But SELECT queries, to get information out of RDF graphs, are the most
important kind of query.  Feed this into JDBC environments in the JDBC
pardigm and we may even be able to reuse JDBC client tools.

>=20
> > Also, given triples doesn't come back in any particular order from
> > CONSTRUCT then I find it hard to see many processing models that can
> > achieve streamability. Maybe you could sketch your use case in a
> > little more detail?  It's that bit I'm puzzled by.=20
> >=20
> >=20
> Firstly there's a difference in building application specific (domain)
> model from a stream (of triples) or building first a generic RDF model
> and only then building the actual target model.
>=20
> Secondly, even though in general case the order of triples isn't
> guaranteed, it's quite common to group the statements by the subject.

I disagree - you're relying on the server having processed the entire
model to make the statements come out nicely.  Jena uses hashes all over
the place - things do not come out in any sort of order, nor is it
consistent.

> In
> case the contstruct matches are streamed directly, one could assume
that
> triples of a single template match would be some how grouped. Hardly
in
> any case the order of returned triples is fully random.
>=20
> The simplest and most obvious use case is to visualize the triples
> returned in a tabular form directly. Many GUI/WUI table widgets
provide
> sorting of the rows by columns.
>=20
> > [[And there aren't any told bnodes in general (but ARQ you can get
> > them=20
> > by setting the right config options :-)  Not sure who will support
> > told=20
> > bNodes.  3Store maybe.]]
> >=20
> >=20
> There also isn't a way of requiring that a certain type of resources
> should be URI's...
>=20
> I used to build (when using Jena) RDQL queries prorammatically using
the
> query API directly since that way one can (or at least could) also use
> the bnodes in queries, thus avoiding failing queries in case bnodes
were
> used. Of course I was told that I shouldn't do this and to use
> models/resources accessors instead... however when using RDB model
with
> fast path, I was able to achieve magnitudes of better performance this
> way. If I recall right I even used (at least at some point) the
> ResultBinding#getTriples() to process the results.

But it would not have worked with Joseki :-) bNodes not preserved across
the network.

You might like to see the ARQ configuration options I just put in so
bnode ids are passed transparently across the network.  It's not the
default mode (makes the XML results look ugly).

>=20
> > The key is that it minimises the requirements on the client.  If we
> > assume there is a complete RDF system in the client, why force
> > ourselves=20
> > through the JDBC paradigm when we could just as easily have a
> > SPARQL-protocol specific API?  The value of SPARQL4j to me is to
> > connect=20
> > to applications that don't want a full RDF processing system but do
> > want=20
> > to get some information out of an RDF repository.
> >=20
> >=20
> Exactly(!) and in my opinion this should apply also to the
> CONSTRUCT/DESCRIBE queries. Such an application could do hardly
anything
> with a byte stream of RDF/XML not to mention N3.
>=20
> > > A graph may not be, but triples are also usable as such.
> > >=20
> > > Also I find the stream based access to the results quite usable
> > > regardles of the result form - at least if it's XML and not N3
> > > (e.g. XSLT).=20
> > >=20
> > >=20
> >=20
> > That is XSLT on the XML results?
> >=20
> >=20
> Yes.

So it's a SELECT query.

>=20
> > If you mean RDF/XML, the instability of the encoding is why DAWG had
> > to=20
> > do a fixed schema XML results format.
> >=20
> >=20
> All the more reason why we should provide also RDF-parsing :-)
> Succeeding to do this in a toolkit independent way might also be
usable
> to anyone building toolkits...

Then the objective of the project is now providing another API to RDF
(wrapping all toolkits is just like a new toolkit that uses others as
its implementation.  I have a toolkit, and a SPARQL-protocol interface
that (I think) is easiler to work with than warping to the JDBC paradign
and and waroping back again - it's cognitive bruden on the app writer.

I wanted an approach of doing something clearcut, minimal and distinct
with a clear value.  But this now seems to be growing into a general
purpose RDF framework.  Can we find some limits please?

	Andy

>=20
> > > Perhaps we should discuss and document what kind of use cases we
wan
> > >=20
> > >=20
> > to
> >=20
> >=20
> > > support with sparql4j?
> > >=20
> > >=20
> >=20
> > Cool - good idea.
> >=20
> >=20
> Let's start a separate thread for this and copy-paste results into the
> document :-)
>=20
> -Samppa
>=20
> --
> Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin
> kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358
(0)9
> 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com

Re: [Sparql4j-devel] RE: XML parsing

From: Samppa S. <sam...@pr...> - 2006-01-05 09:52:03

>ResourceFactory (it creates them relative to a hidden model but all
>operations, if passed a resource from one model convert to a resource in
>the model where the operation is to be performed automatically.
>Resources need to know where they come from so
>   resource.getProperty()
>works.
>
>Otherwsie, an application can quite easily work at the Jena Graph level.
>It is stable and more pure (less convenience) and triples are, well,
>triples (3-tuples).
>
>Which per-model caches are you referring to?  (and all Jena's caches are
>just that - caches - things work if they get bypassed).
>  
>
EnhGraph.enhNodes...

Things work, but using cache produces less gargage and creating 
resources directly to the target model is better than having multiple 
caches (one internal to the factory and one to the target model). Anyway 
these are very subtle differences and the important thing is that things 
work.

>Caching partial query results is v hard unless you can deduce one query
>is a sub-query os another.
>  
>
True - and that's not what I ment. What I was really questioning that is 
it really enough to have the factory defined at driver -level, or do we 
need the ability to set/override the factory at statement level AND that 
what's the best alternative for this depends on the toolkit used.

>>But there's no standard way in jdbc for user to access this
>>    
>>
>information.
>  
>
>>If the user is provided with an access to InputStream of the result,
>>    
>>
>he
>  
>
>>needs to get access to the content type also. 
>>    
>>
>
>The driver would access the information and so know how to parse the
>incoming RDF graph.  In fact, it needs a factory interface
>
>interface GraphFactory
>{
>	Object parse(InputStream, String httpContentTypeAndCharset) ;
>}
>
>then a CONSTRUCT returns a 1-row,1-col result set : getObject() is the
>graph.  getCharacterStream() or getBinaryStream() would give a more
>direct access if needed but (see below) I don't see these are common.
>  
>
I see. getCharacterStream and getBinaryStream are definitely better 
alternatives than clob/blob. This approach may have the drawback that 
the sequence at which column-accessor-methods are called makes the 
result set behave differently, e.g. if getBinaryStream(1) is called 
first, getObject(1) is no longer available and vice versa (i.e. unless 
the binary stream is cached). Also this kind of dual type -column cannot 
be defined via ResultSetMetaData, can it?

If this same approach would be applied select/ask results (i.e. 
accessing getBinaryStream(1) of the first row would return whole result 
as a stream, instead of rows) the result would be even more confusing.

>>>Could you give the use case you have in mind here? (why is it more
>>>convenient to have a stream of triples?)
>>>
>>>
>>>      
>>>
>>I use frequently Model.listStatements variants - and have used in
>>    
>>
>every
>  
>
>>RDF based applications I've ever made using Jena or SIR ;-) I wouldn't
>>like the performance penalty nor increased memory requirements of
>>having to read the results first into a model just for iterating over
>>them. One could also argue that every (reading) RDF operation involves
>>ultimately a stream/iteration of triples. Sure there's convenience
>>accesses filtering objects of the statements or select-type query
>>returning bindings, but these operations in turn rely on statement
>>iterations. [When building a generic program that doesn't have full
>>control of all input, the select-query- access is strictly speaking
>>    
>>
>not
>  
>
>>usable if "told bnodes" are not supported.]
>>    
>>
>
>We need to go back to use cases and the role of SPARQL4j.
>
>  
>
Yes :-)

>An important value of SPARQl4j (for me) is that an application, which is
>not very RDF aware, to be able to get information out of remote RDF
>repository by SPARQL query.
>  
>
However, relying on external RDF API (or InputStream) to handle certain 
kind of queries makes it highly RDF dependent and thus requires 
ultimatly that the user is not only aware of RDF but also aware of the 
(configuration dependent) RDF toolkit.

The design I'm proposing aimes

1) at providing as-natural-as-it-gets jdbc (row/column -based) behaviour 
in all cases

2) not blocking more RDF/Sparql aware use cases / applications and

3) making the decision between the desired approach explicit (i.e. an 
application that is aware of the result forms may access the results 
directly by being aware of the Sparql-specific jdbc api extensions),

4) providing the user all the necessary information and control (e.g. 
ability to define accept preferences when executing a query and access 
to the actual content type of the result) needed to process the results 
directly,

5) providing factory-based / RDF toolkit dependent getObject() accessors 
(only) as convenience accessors for RDF aware application

>JDBC is not a good match to getting RDF graphs (as we are finding!) and
>choosing one processing model (streams of something) over another makes
>assumptions about the nature and approach of the toolkit.
>  
>
That's why I'd try to avoid any direct/required dependency to the 
factory, providing it only (and possibly optionally) for convenience to 
more RDF aware users. In my opinion the triple-per-row* is 
as-good-as-it-gets alternative for row based (i.e. jdbc style) handling 
of RDF. It actually resembles one of RDF's serialization forms, namely 
NTRIPLES. Also the W3C's RDF Validator provides tabular form of the 
parsed graph, which I have found quite usefull.

*) triple-per-row:

// JDBC-specific access exposed via ResultSetMetaData
String subject = rs.getString("subject");
int subjectType = rs.getInt("subject$type"); // URI | BLANK_NODE
String predicate = ns.getString("predicate");
String object = rs.getString("object");
int objectType = rs.getInt("object$type"); // URI | BLANK_NODE | 
PLAIN_LITERAL | TYPED_LITERAL
String lang = rs.getString("object$lang");
String datatype = rs.getString("object$datatype");

// RDF toolkit specific convenience/hidden access
Resource subject = (Resource)rs.getObject("subject");
Predicate predicate = (Predicate)rs.getObject("predicate");
RDFNode object = (RDFNode)rs.getObject("object");

Note that the JDBC-specific access can easily be used to provide a 
configuration (i.e. factory) independent access to any RDF toolkit 
specific objects. It's also by far more robust way of accessing the 
toolkit specific resources than the factory approach which would end up 
in ClassCastExceptions if configuration changed. Actually the factory 
approach could aesily be replaced with simple (and robust) ResultSet 
wrappers / handlers.

>If the application wants to do listStatements AND wants triples in the
>local toolkit format, then the value of a JDBC interface (which is
>row-oriented) is pretty slim.  So I do not see a high value for SPARQl4j
>as a general connection over the SPARQL protocol in the initial
>releases.
>  
>
Triple-per-row on the other hand offers even something usefull for a not 
so RDF/Sparql/toolkit aware application.

The value of providing only sparql result form parsing is also pretty 
slim - I have actually already implemented it (just not yet committed it 
into CVS). As parsing RDF/XML (and/or N3) is much more difficult, 
providing just that in a toolkit independent way would actually provide 
extra value.

BTW Isn't it a bit contrary to the open-world view of Semantic Web, when 
you argued in your previous email that a model isn't usable unless all 
of it's statements are known?

>It would seem likely that every RDF toolkit will have a built-in SPARQL
>client so if the application is doing RDF processing, it is much better
>to use that that trying to fit around the JDBC row-oritneted paradigm.
>It's pretty easy to write (that part ARQ is quite small - some rather
>tedious HTTP connection bashing).
>  
>
So what's the point of sparql4j then?

>Also, given triples doesn't come back in any particular order from
>CONSTRUCT then I find it hard to see many processing models that can
>achieve streamability. Maybe you could sketch your use case in a little more detail?  It's that bit I'm puzzled by.
>  
>
Firstly there's a difference in building application specific (domain) 
model from a stream (of triples) or building first a generic RDF model 
and only then building the actual target model.

Secondly, even though in general case the order of triples isn't 
guaranteed, it's quite common to group the statements by the subject. In 
case the contstruct matches are streamed directly, one could assume that 
triples of a single template match would be some how grouped. Hardly in 
any case the order of returned triples is fully random.

The simplest and most obvious use case is to visualize the triples 
returned in a tabular form directly. Many GUI/WUI table widgets provide 
sorting of the rows by columns.

>[[And there aren't any told bnodes in general (but ARQ you can get them
>by setting the right config options :-)  Not sure who will support told
>bNodes.  3Store maybe.]]
>  
>
There also isn't a way of requiring that a certain type of resources 
should be URI's...

I used to build (when using Jena) RDQL queries prorammatically using the 
query API directly since that way one can (or at least could) also use 
the bnodes in queries, thus avoiding failing queries in case bnodes were 
used. Of course I was told that I shouldn't do this and to use 
models/resources accessors instead... however when using RDB model with 
fast path, I was able to achieve magnitudes of better performance this 
way. If I recall right I even used (at least at some point) the 
ResultBinding#getTriples() to process the results.

>The key is that it minimises the requirements on the client.  If we
>assume there is a complete RDF system in the client, why force ourselves
>through the JDBC paradigm when we could just as easily have a
>SPARQL-protocol specific API?  The value of SPARQL4j to me is to connect
>to applications that don't want a full RDF processing system but do want
>to get some information out of an RDF repository.
>  
>
Exactly(!) and in my opinion this should apply also to the 
CONSTRUCT/DESCRIBE queries. Such an application could do hardly anything 
with a byte stream of RDF/XML not to mention N3.

>>A graph may not be, but triples are also usable as such.
>>
>>Also I find the stream based access to the results quite usable
>>regardles of the result form - at least if it's XML and not N3 (e.g.
>>XSLT).  
>>    
>>
>
>That is XSLT on the XML results?
>  
>
Yes.

>If you mean RDF/XML, the instability of the encoding is why DAWG had to
>do a fixed schema XML results format.
>  
>
All the more reason why we should provide also RDF-parsing :-) 
Succeeding to do this in a toolkit independent way might also be usable 
to anyone building toolkits...

>>Perhaps we should discuss and document what kind of use cases we wan
>>    
>>
>to
>  
>
>>support with sparql4j? 
>>    
>>
>
>Cool - good idea.
>  
>
Let's start a separate thread for this and copy-paste results into the 
document :-)

-Samppa

-- 
Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com

RE: [Sparql4j-devel] RE: XML parsing

From: Seaborne, A. <and...@hp...> - 2006-01-04 16:39:31


-------- Original Message --------
> From: Samppa Saarela <>
> Date: 4 January 2006 08:26
>=20
> > > If we have a factory that can handle RDF terms, adding support for
> > > triples is trivial.=20
> > >=20
> > >=20
> >=20
> > In the sense that creating a triple is just a 3-slot object, yes.
But
> > the factory idea means that objects specific to the local RDF tookit
> > are retruned and it will have it's own idea of a triple (e..g. in
> > Jena, the application API object "Statement" is not a plain 3 slot
> > triple - it knows which model it comes from).
> >=20
> >=20
>=20
> There's no public API for constructing RDFNodes directly either in
> Jena, so that too might be a problem too. Wouldn't it at least bypass
> all (per-model) node caches? =20

ResourceFactory (it creates them relative to a hidden model but all
operations, if passed a resource from one model convert to a resource in
the model where the operation is to be performed automatically.
Resources need to know where they come from so
   resource.getProperty()
works.

Otherwsie, an application can quite easily work at the Jena Graph level.
It is stable and more pure (less convenience) and triples are, well,
triples (3-tuples).

Which per-model caches are you referring to?  (and all Jena's caches are
just that - caches - things work if they get bypassed).

Caching partial query results is v hard unless you can deduce one query
is a sub-query os another.

>=20
> > The XML Results format does not return triples - it's only CONSTRUCT
> > and DESCRIBE.=20
> >=20
> >=20
> Good point.
>=20
> > > I would find it more convenient to get the triples of the graph
> > > returned as triples (i.e. triple per row) using the  factory along
> > >=20
> > >=20
> > with
> >=20
> >=20
> > > speudo column accessors. This way we would (first of all) avoid
> > >=20
> > >=20
> > special
> >=20
> >=20
> > > content-type handling.
> > >=20
> > >=20
> >=20
> > I don't follow this - the HTTP reply header has to be correctly
> > parsed. Such content handling is easy.
> >=20
> >=20
> But there's no standard way in jdbc for user to access this
information.
> If the user is provided with an access to InputStream of the result,
he
> needs to get access to the content type also.=20

The driver would access the information and so know how to parse the
incoming RDF graph.  In fact, it needs a factory interface

interface GraphFactory
{
	Object parse(InputStream, String httpContentTypeAndCharset) ;
}

then a CONSTRUCT returns a 1-row,1-col result set : getObject() is the
graph.  getCharacterStream() or getBinaryStream() would give a more
direct access if needed but (see below) I don't see these are common.

>=20
> > Could you give the use case you have in mind here? (why is it more
> > convenient to have a stream of triples?)
> >=20
> >=20
> I use frequently Model.listStatements variants - and have used in
every
> RDF based applications I've ever made using Jena or SIR ;-) I wouldn't
> like the performance penalty nor increased memory requirements of
> having to read the results first into a model just for iterating over
> them. One could also argue that every (reading) RDF operation involves
> ultimately a stream/iteration of triples. Sure there's convenience
> accesses filtering objects of the statements or select-type query
> returning bindings, but these operations in turn rely on statement
> iterations. [When building a generic program that doesn't have full
> control of all input, the select-query- access is strictly speaking
not
> usable if "told bnodes" are not supported.]

We need to go back to use cases and the role of SPARQL4j.

An important value of SPARQl4j (for me) is that an application, which is
not very RDF aware, to be able to get information out of remote RDF
repository by SPARQL query.

JDBC is not a good match to getting RDF graphs (as we are finding!) and
choosing one processing model (streams of something) over another makes
assumptions about the nature and approach of the toolkit.

If the application wants to do listStatements AND wants triples in the
local toolkit format, then the value of a JDBC interface (which is
row-oriented) is pretty slim.  So I do not see a high value for SPARQl4j
as a general connection over the SPARQL protocol in the initial
releases.

It would seem likely that every RDF toolkit will have a built-in SPARQL
client so if the application is doing RDF processing, it is much better
to use that that trying to fit around the JDBC row-oritneted paradigm.
It's pretty easy to write (that part ARQ is quite small - some rather
tedious HTTP connection bashing).

Also, given triples doesn't come back in any particular order from
CONSTRUCT then I find it hard to see many processing models that can
achieve streamability.  Maybe you could sketch your use case in a little
more detail?  It's that bit I'm puzzled by.

[[And there aren't any told bnodes in general (but ARQ you can get them
by setting the right config options :-)  Not sure who will support told
bNodes.  3Store maybe.]]

>=20
> One (internal to Jena/SIR) use case for this could be a (read-only)
> graph wrapping some sparql-enabled repository, addressing queries when
> necessary and possibly caching the results. =20
>=20
> One (possibly quite far fetched) use case is using this driver in a
> generic SQL browser allowing user to address queries and showing
> results in a tabular form. Having tabular form for RDF triples
(exposed
> vie  =20
> ResultSetMetaData) instead of clob/blob might be more convenient.
>=20
> > Toolkits have their own APIs to their parsers to generate triples -
I
> > guess most have a stream interface (ours do) but it would be more
> > normal to parse directly into the graph and return a graph (yes -
> > streaming is not possible).=20
> >=20
> >=20
> > To get a stream of triples, do the query:
> >=20
> > SELECT * { ?s ?p ?o }
> >=20
> > maybe with an ORDER BY
> >=20
> >=20
> That's certainly one alternative, but gets pretty difficult when using
> just a bit more complex templates.

The key is that it minimises the requirements on the client.  If we
assume there is a complete RDF system in the client, why force ourselves
through the JDBC paradigm when we could just as easily have a
SPARQL-protocol specific API?  The value of SPARQL4j to me is to connect
to applications that don't want a full RDF processing system but do want
to get some information out of an RDF repository.
=20
>=20
> > As a graph isn't usable until all the triples are known (a triple
can
> > turn up at any point in the stream), an application would need to do
> > the SELECT query to process results before the last is seen.
> >=20
> >=20
> A graph may not be, but triples are also usable as such.
>=20
> Also I find the stream based access to the results quite usable
> regardles of the result form - at least if it's XML and not N3 (e.g.
> XSLT). =20

That is XSLT on the XML results?

If you mean RDF/XML, the instability of the encoding is why DAWG had to
do a fixed schema XML results format.

>=20
> Perhaps we should discuss and document what kind of use cases we wan
to
> support with sparql4j?=20

Cool - good idea.

>=20
> Br,
> Samppa

	Andy

>=20
> --
> Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin
> kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358
(0)9
> 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com

>=20
>=20
>=20
>=20
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> files=20
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD
SPLUNK!
> http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick
> _______________________________________________
> Sparql4j-devel mailing list
> Spa...@li...
> https://lists.sourceforge.net/lists/listinfo/sparql4j-devel

Re: [Sparql4j-devel] RE: XML parsing

From: Samppa S. <sam...@pr...> - 2006-01-04 08:28:08

>>If we have a factory that can handle RDF terms, adding support for
>>triples is trivial. 
>>    
>>
>
>In the sense that creating a triple is just a 3-slot object, yes.  But
>the factory idea means that objects specific to the local RDF tookit are
>retruned and it will have it's own idea of a triple (e..g. in Jena, the
>application API object "Statement" is not a plain 3 slot triple - it
>knows which model it comes from).
>  
>

There's no public API for constructing RDFNodes directly either in Jena, 
so that
too might be a problem too. Wouldn't it at least bypass all (per-model) 
node caches?

>The XML Results format does not return triples - it's only CONSTRUCT and
>DESCRIBE.
>  
>
Good point.

>>I would find it more convenient to get the triples of the graph
>>returned as triples (i.e. triple per row) using the  factory along
>>    
>>
>with
>  
>
>>speudo column accessors. This way we would (first of all) avoid
>>    
>>
>special
>  
>
>>content-type handling.
>>    
>>
>
>I don't follow this - the HTTP reply header has to be correctly parsed.
>Such content handling is easy.
>  
>
But there's no standard way in jdbc for user to access this information. 
If the user is provided with an access to InputStream of the result, he 
needs to get access to the content type also.

>Could you give the use case you have in mind here? (why is it more
>convenient to have a stream of triples?)
>  
>
I use frequently Model.listStatements variants - and have used in every 
RDF based applications I've ever made using Jena or SIR ;-) I wouldn't 
like the performance penalty nor increased memory requirements of having 
to read the results first into a model just for iterating over them. One 
could also argue that every (reading) RDF operation involves ultimately 
a stream/iteration of triples. Sure there's convenience accesses 
filtering objects of the statements or select-type query returning 
bindings, but these operations in turn rely on statement iterations. 
[When building a generic program that doesn't have full control of all 
input, the select-query- access is strictly speaking not usable if "told 
bnodes" are not supported.]

One (internal to Jena/SIR) use case for this could be a (read-only) 
graph wrapping some sparql-enabled repository, addressing queries when 
necessary and possibly caching the results.

One (possibly quite far fetched) use case is using this driver in a 
generic SQL browser allowing user to address queries and showing results 
in a tabular form. Having tabular form for RDF triples (exposed vie 
ResultSetMetaData) instead of clob/blob might be more convenient.

>Toolkits have their own APIs to their parsers to generate triples - I
>guess most have a stream interface (ours do) but it would be more normal
>to parse directly into the graph 
>and return a graph (yes - streaming is not possible).  
>
>
>To get a stream of triples, do the query:
>
>SELECT * { ?s ?p ?o }
>
>maybe with an ORDER BY 
>  
>
That's certainly one alternative, but gets pretty difficult when using 
just a bit more complex templates.

>As a graph isn't usable until all the triples are known (a triple can
>turn up at any point in the stream), an application would need to do the
>SELECT query to process results before the last is seen.
>  
>
A graph may not be, but triples are also usable as such.

Also I find the stream based access to the results quite usable 
regardles of the result form - at least if it's XML and not N3 (e.g. XSLT).

Perhaps we should discuss and document what kind of use cases we wan to 
support with sparql4j?

Br,
Samppa

-- 
Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com

RE: [Sparql4j-devel] RE: XML parsing

From: Seaborne, A. <and...@hp...> - 2006-01-03 14:05:15


-------- Original Message --------
> From: Samppa Saarela <>
> Date: 29 December 2005 08:40
>=20
> > There is a mismatch between the JDBC paradigm and the SAX paradigm.
> > JDBC is purely driven by the client application and there are no
> > mandator call backs.  SAX is driven by the rate of arrival as given
> > by the parser so it migh have to accumulate results until the client
> > is ready.=20
> >=20
> > There could be a bounded pipe between application and SAX code but I
> > found it simpler to use a StAX parser (Woodstox) because the whole
> > results-consuming process is then determined by the application.  It
> > is as easy to write StAX code as to write SAX > code.
>=20
> StAX seems like a good choice.
>=20
> > For handling SELECT queries, we don't need a full API.  We need to
be
> > > able to handle RDF terms, but not triples.
>=20
> If we have a factory that can handle RDF terms, adding support for
> triples is trivial.=20

In the sense that creating a triple is just a 3-slot object, yes.  But
the factory idea means that objects specific to the local RDF tookit are
retruned and it will have it's own idea of a triple (e..g. in Jena, the
application API object "Statement" is not a plain 3 slot triple - it
knows which model it comes from).

The XML Results format does not return triples - it's only CONSTRUCT and
DESCRIBE.

>=20
> > For CONSTRUCT, etc, it might be better to properly link the result
to
> > > the local RDf toolkit of choice (e.g. via an InputStream).  c.f.
> > > SQL Blobs/Clobs.=20
>=20
>=20
> I would find it more convenient to get the triples of the graph
> returned as triples (i.e. triple per row) using the  factory along
with
> speudo column accessors. This way we would (first of all) avoid
special
> content-type handling.

I don't follow this - the HTTP reply header has to be correctly parsed.
Such content handling is easy.


Could you give the use case you have in mind here? (why is it more
convenient to have a stream of triples?)

> When using the InputStream-approach the user
> should be in control of the requested content-type. However, since
> InputStreams are more convenient in some situations (e.g. when using
> XSLT to process the results) maybe the best alternative would be to
> provice both... and since at least some of the use cases for
> InputStream access to the results are the same regardless of the type
> of the query, I see no reason to limit the usage of InputStream
results
> only to construct and describe. To support these use cases we could
> overload Statement's execute with one that returns an InputStream.
E.g.
>=20
> Connection c =3D datasource.getConnection();
>=20
> Statement select =3D c.createStatement();
> InputStream results =3D select.executeRaw("<any sparql query>");
>=20
>=20
> The preferred RDF serialization could be provided to the statement via
> select.setRdfLang("N3") similarily to other hints (e.g. fetch size,
> escape processing...).=20
>=20
>=20
> Br,
>   Samppa

Toolkits have their own APIs to their parsers to generate triples - I
guess most have a stream interface (ours do) but it would be more normal
to parse directly into the graph=20
and return a graph (yes - streaming is not possible). =20


To get a stream of triples, do the query:

SELECT * { ?s ?p ?o }

maybe with an ORDER BY=20

As a graph isn't usable until all the triples are known (a triple can
turn up at any point in the stream), an application would need to do the
SELECT query to process results before the last is seen.

	Andy

>=20
>=20
>=20
> --
> Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin
> kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358
(0)9
> 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com=20
>=20

RE: [Sparql4j-devel] Integration of external RDF APIs via Factory

From: Seaborne, A. <and...@hp...> - 2006-01-03 13:49:47

Hi Timo,



-------- Original Message --------
> From: Timo Westkamper <>
> Date: 22 December 2005 15:00
>=20
> Hello.
>=20
> Concerning SELECT results access it would be possible to plug RDF apis
> into the driver via a factory interface.=20

>=20
> We could provide a neutral SPARQL4J dependent minimal RDF API and in
> addition to this the possibility to plug external APIs into the
driver.=20

Sounds like a good idea.

Overall the interface looks OK - a few minor comments.

>=20
> RDF nodes (or any other objects returned by the factory) could be
> accessed via the umtyped accessors of the JDBC result set.=20
>=20
> e.g.
>=20
> interface NodeFactory{
>=20
>   public Object createLiteral(String lex, String lang, String
datatype);

Minor: as lang and datatype can't both be set:

createTypedLiteral(String lex, String datatype);
createPlainLiteral(String lex, String lang);

is possible.  As an internal factory API, it isn't significant - you're
one-call version is fine.

>=20
>   public Object createResource(String uri);
>=20
>   public Object createResource(String ns, String local);

Not sure about this one.  Namespaces are purely syntactic in RDF and the
XML Result format does not provide them.  Did you have a specific usage
in mind?

Wouldn't only an RDF/XML format results need this and then it is a
matter of a ful RDF parser anyway?

>=20
>   public Object createBlankNode(String internalID);
>=20
> }

One observation - RDF resources covers all graph labels (URI, blank
nodes and literals).  There isn't an official term for a URI labeled
node or arc so your naming is probably as good as it gets.

	Andy

>=20
> The name of the implementation class could be given to the Driver via
> the property map.=20
>=20
> Br,
> Timo.
>=20

[Sparql4j-devel] Result handling - URGENT

From: Samppa S. <sam...@pr...> - 2006-01-03 10:50:19

Problem: How to provide both row and InputStream based access to all 
types of query results.

Row based access fits better to jdbc semantics and should thus be 
emphasized. However InputStream based access is usefull for example for 
XSLT post processing and for using external RDF parser to process 
CONSTRUCT and DESCRIBE results.

Constraints:

1) Branching should be clear and consistent regardless of the type of 
the query, e.g. not dependent execution order, independent of query type 
(for ad hoc query processing)

2) JDBC semantics should not be rewritten. Future support for 
PreparedStatements, updates and stored procedures should not be blocked.

3) User should be in control of accept -preferences, i.e. whether n3 or 
rdf/xml is preferred for RDF-based results.

4) User should be able to access the content type of the results (when 
using InputStream access) as it is ultimately the server which decides 
the type of result (accept's are only preferences and may include both 
n3 and rdf/xml).


We suggest that triple-per-row should be used as the basic approach for 
DESCRIBE and CONSTRUCT (i.e. RDF form) results, with subject, predicate 
and object as the basic columns (exposed via ResultSetMetaData in 
addition to pseudo-columns for subject and object type as well as 
literal-object's additional properties).

Here are a couple of alternatives for InputStream based access to results.

/* Target method for input stream processing */
public void processStream(InputStream stream, String contentType) {...}


1) SparqlStatement

public interface SparqlStatement extends Statement {
    ...
    public ResultStream executeSparql(String query, String[] accept) {...}
    ...
}

public class ResultStream extends InputStream {
    public String getContentType() {...}
    public String getEncoding() {...}
}

Usage:

SparqlStatement stmt = (SparqlStatement) connection.createStatement();
ResultStream stream = stmt.executeSparql("CONSTRUCT ...", new 
String[]{"text/rdf+n3"});
String contentType = stream.getContentType();
processStream(stream, contentType);

Pros/cons:

+ Provides SPARQL specific extension to JDBC while preserving JDBC semantics
+ Clear way of providing accept preferences per query
+ Direct InputStream access independent of ResultSet and clob/blob access
- Needs explicit casting
- Is dependent on sparql4j API (for the SparqlStatement and ResultStream 
interfaces)


2) Specifying result set type when creating statement

Statement stmt = connection.createStatement(SparqlResultSet.TYPE_STREAM, 
ResultSet.CONCUR_READ_ONLY);
ResultSet rs = stmt.execute("CONSTRUCT ...");
if (rs.hasNext()) {
    Blob blob = rs.getBlob(1);
    String contentType = rs.getString(2);
    processStream(blob.getBinaryStream(), contentType);
}

Pros/cons:

+ Doesn't need explicit casting
+ No dependency on Sparql4J API except for the special result set type 
constant
- Requires buffering of the whole result (length of the result might not 
be available and there's no limitations on how many times 
getBinaryStream can be called).
- Clob/blob is under spceficied (e.g. relation to transactions) and 
implementations/usage are vendor specific.

Br,
  Samppa & Timo

-- 
Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com

[Sparql4j-devel] Java version

From: Samppa S. <sam...@pr...> - 2006-01-02 13:24:07

Hi all,

What is the target java -version of sparql4j? And what about source and 
.class files compability?

My Eclipse is set to use 1.4 with default compliance settings 
(source=1.3 and .class files=1.2).

Any reason why we should use 1.5? I don't see any.

What about 1.4 and asserts?

Br,
  Samppa

-- 
Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com

[Sparql4j-devel] RE: XML parsing

From: Samppa S. <sam...@pr...> - 2005-12-29 08:42:26

> There is a mismatch between the JDBC paradigm and the SAX paradigm.  
> JDBC is purely driven
> by the client application and there are no mandator call backs.  SAX 
> is driven by the rate
> of arrival as given by the parser so it migh have to accumulate 
> results until the client is
> ready.
>
> There could be a bounded pipe between application and SAX code but I 
> found it simpler to
> use a StAX parser (Woodstox) because the whole results-consuming 
> process is then determined
> by the application.  It is as easy to write StAX code as to write SAX > code.
 
StAX seems like a good choice.

> For handling SELECT queries, we don't need a full API.  We need to be > able to handle RDF
> terms, but not triples.

If we have a factory that can handle RDF terms, adding support for triples is trivial.

> For CONSTRUCT, etc, it might be better to properly link the result to > the local RDf toolkit
> of choice (e.g. via an InputStream).  c.f. SQL Blobs/Clobs.


I would find it more convenient to get the triples of the graph returned 
as triples (i.e. triple per row)
using the  factory along with speudo column accessors. This way we would 
(first of all) avoid special content-type handling. When using the 
InputStream-approach the user should be in control of the requested 
content-type. However, since InputStreams are more convenient in some 
situations (e.g. when using XSLT to process the results) maybe the best 
alternative would be to provice both... and since at least some of the 
use cases for InputStream access to the results are the same regardless 
of the type of the query, I see no reason to limit the usage of 
InputStream results only to construct and describe. To support these use 
cases we could overload Statement's execute with one that returns an 
InputStream. E.g.

Connection c = datasource.getConnection();

Statement select = c.createStatement();
InputStream results = select.executeRaw("<any sparql query>");


The preferred RDF serialization could be provided to the statement via 
select.setRdfLang("N3") similarily to other hints (e.g. fetch size, 
escape processing...).


Br,
  Samppa



-- 
Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin 
kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 
855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com

Re: [Sparql4j-devel] Integration of external RDF APIs via Factory

From: Samppa S. <sam...@pr...> - 2005-12-29 07:47:20

In addition to getObject returning RDFNodes via a configurable factory 
we could provide pseudo columns to access properties of nodes (type, 
datatype, lang) e.g.

ResultsSet rs = executeQuery("select * where {?s ?p ?o}");
while (rs.next()) {
    String label = rs.getString("o");
    int type = rs.getInt("o$type");
    switch (type) {
        case RESOURCE: ...
        case BLANK_NODE: ...
        case PLAIN_LITERAL:
           String lang = rs.getString("o$lang");
           ...
        case TYPED_LITERAL:
           String datatype = rs.getString("o$datatype");
           ...
    }
}

Should XMLLiteral be handled as a separate type or as a typed literal 
with datatype rdf:XMLLiteral?

Br,
  Samppa

> Hello.
>
> Concerning SELECT results access it would be possible to plug RDF apis 
> into the driver via a factory interface.
>
> We could provide a neutral SPARQL4J dependent minimal RDF API and in 
> addition to this the possibility to plug external APIs into the driver.
>
> RDF nodes (or any other objects returned by the factory) could be 
> accessed via the umtyped accessors of the JDBC result set.
>
> e.g.
>
> interface NodeFactory{
>
>  public Object createLiteral(String lex, String lang, String datatype);
>
>  public Object createResource(String uri);
>
>  public Object createResource(String ns, String local);
>
>  public Object createBlankNode(String internalID);
>
> }
>
> The name of the implementation class could be given to the Driver via 
> the property map.
>
> Br,
> Timo.
>


-- 
Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com

[Sparql4j-devel] Integration of external RDF APIs via Factory

From: Timo W. <tim...@pr...> - 2005-12-22 15:00:29

Hello.

Concerning SELECT results access it would be possible to plug RDF apis 
into the driver via a factory interface.

We could provide a neutral SPARQL4J dependent minimal RDF API and in 
addition to this the possibility to plug external APIs into the driver.

RDF nodes (or any other objects returned by the factory) could be 
accessed via the umtyped accessors of the JDBC result set.

e.g.

interface NodeFactory{

  public Object createLiteral(String lex, String lang, String datatype);

  public Object createResource(String uri);

  public Object createResource(String ns, String local);

  public Object createBlankNode(String internalID);

}

The name of the implementation class could be given to the Driver via 
the property map.

Br,
Timo.

-- 
Timo Westkämper <timo.westkamper at profium.com>
Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland
Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002
Mob. +358 (0)40 591 2172  Internet: http://www.profium.com

RE: [Sparql4j-devel] Commit note: skeletons and property management. Planning next steps...

From: Seaborne, A. <and...@hp...> - 2005-12-21 16:34:12


-------- Original Message --------
> From: Janne Saarela <>
> Date: 20 December 2005 15:38
>=20
> Just to let you know: I've committed some skeleton files to cvs. I've
> mostly developed Driver.java and DriverTest.java which include some
> property management functionality and pass simple unit tests. =20
>=20

Aside:

Just checked these out - it would be better to use Commons Logging
rather than hardwire log4j because sparq4j is a library that has to
co-exist with other subsystems.

Using org.apache.logging is just as easy.  It uses Log4J if it can find
an implementation on the classpath else uses Java logging else uses a
simple built in logger.

> The Connection.java class needs XML parsing to be implemented with
> ResultSet.java implementation to hold and serve query results -
> volunteers? =20
>=20
> How do you feel like testing the driver against SPARQL
implementations?
> Perhaps the server URI should be made configurable via some
> test.properties file which is picked up by build.xml? Volunteers?=20

Will do.

What data and implementations are we going to test against?  It would be
useful to be able for us all to be able to recreate the same tests.

I have a small server at http://sparql.org/sparql which is backed by
Joseki and ARQ.  See http://sparql.org/query.html  Just don't overload
it by loading large graphs (it should give an error if the graph is over
10K triples).

What's the state of Profium's engine?  I'd like to try my client library
in parallel to compare SPARQl4j with it.  It's in the ARQ codebase - and
the command line arq.sparql will access remote services with the
--service argument (over HTTP).

	Andy

>=20
> I plan to throw exceptions either by messages 'not supported' and
'TODO:
> not implemented' in all of the methods required by java.sql
interfaces.
> They will eventually disappear as our work progresses.

Good way forward.

>=20
> Regards,
> Janne
> --
> Janne Saarela <janne.saarela at profium.com> Profium, Lars Sonckin
> kaari 12, 02600 Espoo, Finland=20
> Internet: http://www.profium.com
>=20
>=20
>=20
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> files=20
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD
SPLUNK!
> http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick
> _______________________________________________
> Sparql4j-devel mailing list
> Spa...@li...
> https://lists.sourceforge.net/lists/listinfo/sparql4j-devel

RE: [Sparql4j-devel] XML parsing

From: Seaborne, A. <and...@hp...> - 2005-12-21 16:23:33


-------- Original Message --------
> From: Timo Westkamper <>
> Date: 21 December 2005 10:13
>=20
> Hello.
>=20
> I have added basic SAX based result parsing (SELECT results) into CVS.

There is a mismatch between the JDBC paradigm and the SAX paradigm.  =
JDBC is purely driven by the client application and there are no =
mandator call backs.  SAX is driven by the rate of arrival as given by =
the parser so it migh have to accumulate results until the client is =
ready.

There could be a bounded pipe between application and SAX code but I =
found it simpler to use a StAX parser (Woodstox) because the whole =
results-consuming process is then determined by the application.  It is =
as easy to write StAX code as to write SAX code.

ARQ does this - the code is in ARQ CVS:

:pserver:ano...@cv...:/cvsroot/jena/

Module ARQ
Package com.hp.hpl.jena.query.resultset

>=20
> What is your opinion on including a lightweight RDF API for passing =
RDF
> nodes around?=20

For handling SELECT queries, we don't need a full API.  We need to be =
able to handle RDF terms, but not triples.

For CONSTRUCT, etc, it might be better to properly link the result to =
the local RDf toolkit of choice (e.g. via an InputStream).  c.f. SQL =
Blobs/Clobs.

	Andy

>=20
> Br,
> Timo Westk=E4mper.

[Sparql4j-devel] XML parsing

From: Timo W. <tim...@pr...> - 2005-12-21 10:13:21

Hello.

I have added basic SAX based result parsing (SELECT results) into CVS.

What is your opinion on including a lightweight RDF API for passing RDF 
nodes around?

Br,
Timo Westkämper.


-- 
Timo Westkämper <timo.westkamper at profium.com>
Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland
Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002
Mob. +358 (0)40 591 2172  Internet: http://www.profium.com

[Sparql4j-devel] Commit note: skeletons and property management. Planning next steps...

From: Janne S. <jan...@pr...> - 2005-12-20 15:38:28

Just to let you know: I've committed some skeleton files to cvs. I've 
mostly developed Driver.java and DriverTest.java which include some 
property management functionality and pass simple unit tests.

The Connection.java class needs XML parsing to be implemented with 
ResultSet.java implementation to hold and serve query results - volunteers?

How do you feel like testing the driver against SPARQL implementations? 
Perhaps the server URI should be made configurable via some 
test.properties file which is picked up by build.xml? Volunteers?

I plan to throw exceptions either by messages 'not supported' and 'TODO: 
not implemented' in all of the methods required by java.sql interfaces. 
They will eventually disappear as our work progresses.

Regards,
Janne
-- 
Janne Saarela <janne.saarela at profium.com>
Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland
Internet: http://www.profium.com

RE: [Sparql4j-devel] Design issue: connection management and pooling

From: Seaborne, A. <and...@hp...> - 2005-11-16 18:05:58


-------- Original Message --------
> From: Janne Saarela <>
> Date: 15 November 2005 12:28
>=20
> Hi all
>=20
> Can I have some of your cpu time for thinking how we should manage
> connection objects in sparql4j project?=20
>=20
> As sparql currently does not support state management (cursors or
> anything of the sort), there is no need to manage state in the
> Connection object internally. =20
>=20
> How the connections are passed between Driver, Connection and
Statement
> (or PreparedStatement) classes is something we should design right so
> that eventual state management would be easier to add in the future. =20
>=20
> I am thinking that the Driver.getConnection() method could return
> nothing but a Connection object which internally manages HTTP
> connections to SPARQL servers. =20
>=20
> Once a (Prepared)Statement.execute() method (or variants of it) is
> executed, the actual protocol call is created within the Connection
> object. This would centralize HTTP code management to the Connection
> class leaving room for internal connection pool management if HTTP
> connections were e.g. persisted and left open for future calls.

Good idea - we *might* be able to get this for free by having swappable
HTP clients.  Maybe httpClient does the necessary work for use (although
this should not be a system prerequiste).
   =20
	Andy

>=20
> I don't see a reason to pool Connection objects in the Driver level at
> this time.=20
>=20
> Janne
> --
> Janne Saarela <janne.saarela at profium.com> Profium, Lars Sonckin
> kaari 12, 02600 Espoo, Finland=20
> Internet: http://www.profium.com
>=20
>=20
>=20
> -------------------------------------------------------
> This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
> Register for a JBoss Training Course.  Free Certification Exam
> for All Training Attendees Through End of 2005. For more info visit:
> http://ads.osdn.com/?ad_id=3D7628&alloc_id=3D16845&op=3Dclick
> _______________________________________________
> Sparql4j-devel mailing list
> Spa...@li...
> https://lists.sourceforge.net/lists/listinfo/sparql4j-devel

RE: [Sparql4j-devel] 1st draft of use cases, requirements and design committed to cvs

From: Seaborne, A. <and...@hp...> - 2005-11-15 14:27:02

-------- Original Message --------
> From: Janne Saarela <mailto:jan...@pr...>
> Date: 11 November 2005 11:19
>=20
> > Your remark on the timezone was a good one - what I don't know (yet)
> > is whether the javax.xml.datatypes could be returned from
> > getTimestamp() methods (i.e. do these types inherit any of the java
> > basic data types) or if we need a method signature?
>=20
> The last question should have included word 'new' i.e. 'do we need a
new
> method signature"?
>=20
> Janne

A new method signature would mean that the app needs to use a new
interface to get the functionality.  Doable but may be not what we want.

How about returning java.sql.Timestamp (extends java.util.Date, not
Calendar) as usual with the mapping of XSD datatype to the string form
that DateFormat accepts.  This is OK for query - but for update, things
might not round trip in lexical form (the time is right but the way it
is written may change).

	Andy

RE: [Sparql4j-devel] 1st draft of use cases, requirements and design committed to cvs

From: Seaborne, A. <and...@hp...> - 2005-11-15 14:16:36

-------- Original Message --------
> From: Janne Saarela <mailto:jan...@pr...>
> Date: 11 November 2005 11:13
>=20
> > You're right - email would be best for discussion of the document.
>=20
> Let me switch to email from the word doc for a while.
>=20
> 1. I would now agree that let's not override the existing semantics of
> the execute() method i.e. let's manage boolean return values via
> ResultSet with 1 single row.

Aside: that seems to be the approach of MS SQLServer for XML results - a
one row,one column result set with the XML in it.

> 2. In order to make work proceed step-by-step, should we define the
> development via milestones where M1 would include boolean results as
> well as result sets but not graphs.

Fine - a quite minimal M0 to get the whole project process sorted out
may be a good idea. =20

> M2, once planned later, would then
> include graphs, too. We have some use cases for their eventual
inclusion
> but in order to validate sparql protocol practical aspects we could
> target M1 first.
>=20
> Your remark on the timezone was a good one - what I don't know (yet)
is
> whether the javax.xml.datatypes could be returned from getTimestamp()
> methods (i.e. do these types inherit any of the java basic data types)
> or if we need a method signature?

Next message ...

>=20
> Janne

	Andy

[Sparql4j-devel] Design issue: connection management and pooling

From: Janne S. <jan...@pr...> - 2005-11-15 12:28:12

Hi all

Can I have some of your cpu time for thinking how we should manage 
connection objects in sparql4j project?

As sparql currently does not support state management (cursors or 
anything of the sort), there is no need to manage state in the 
Connection object internally.

How the connections are passed between Driver, Connection and Statement 
(or PreparedStatement) classes is something we should design right so 
that eventual state management would be easier to add in the future.

I am thinking that the Driver.getConnection() method could return 
nothing but a Connection object which internally manages HTTP 
connections to SPARQL servers.

Once a (Prepared)Statement.execute() method (or variants of it) is 
executed, the actual protocol call is created within the Connection 
object. This would centralize HTTP code management to the Connection 
class leaving room for internal connection pool management if HTTP 
connections were e.g. persisted and left open for future calls.

I don't see a reason to pool Connection objects in the Driver level at 
this time.

Janne
-- 
Janne Saarela <janne.saarela at profium.com>
Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland
Internet: http://www.profium.com

Re: [Sparql4j-devel] 1st draft of use cases, requirements and design committed to cvs

From: Janne S. <jan...@pr...> - 2005-11-11 11:19:45

> Your remark on the timezone was a good one - what I don't know (yet) is 
> whether the javax.xml.datatypes could be returned from getTimestamp() 
> methods (i.e. do these types inherit any of the java basic data types) 
> or if we need a method signature?

The last question should have included word 'new' i.e. 'do we need a new 
method signature"?

Janne
-- 
Janne Saarela <janne.saarela at profium.com>
Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland
Internet: http://www.profium.com

Flat | Threaded

1 2 > >> (Page 1 of 2)

2005	Jan (11)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (4)	Nov (11)	Dec (7)
2006	Jan (13)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec