RE: [Sparql4j-devel] RE: XML parsing

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

-------- Original Message --------
> From: Samppa Saarela <>
> Date: 4 January 2006 08:26
>=20
> > > If we have a factory that can handle RDF terms, adding support for
> > > triples is trivial.=20
> > >=20
> > >=20
> >=20
> > In the sense that creating a triple is just a 3-slot object, yes.
But
> > the factory idea means that objects specific to the local RDF tookit
> > are retruned and it will have it's own idea of a triple (e..g. in
> > Jena, the application API object "Statement" is not a plain 3 slot
> > triple - it knows which model it comes from).
> >=20
> >=20
>=20
> There's no public API for constructing RDFNodes directly either in
> Jena, so that too might be a problem too. Wouldn't it at least bypass
> all (per-model) node caches? =20

ResourceFactory (it creates them relative to a hidden model but all
operations, if passed a resource from one model convert to a resource in
the model where the operation is to be performed automatically.
Resources need to know where they come from so
   resource.getProperty()
works.

Otherwsie, an application can quite easily work at the Jena Graph level.
It is stable and more pure (less convenience) and triples are, well,
triples (3-tuples).

Which per-model caches are you referring to?  (and all Jena's caches are
just that - caches - things work if they get bypassed).

Caching partial query results is v hard unless you can deduce one query
is a sub-query os another.

>=20
> > The XML Results format does not return triples - it's only CONSTRUCT
> > and DESCRIBE.=20
> >=20
> >=20
> Good point.
>=20
> > > I would find it more convenient to get the triples of the graph
> > > returned as triples (i.e. triple per row) using the  factory along
> > >=20
> > >=20
> > with
> >=20
> >=20
> > > speudo column accessors. This way we would (first of all) avoid
> > >=20
> > >=20
> > special
> >=20
> >=20
> > > content-type handling.
> > >=20
> > >=20
> >=20
> > I don't follow this - the HTTP reply header has to be correctly
> > parsed. Such content handling is easy.
> >=20
> >=20
> But there's no standard way in jdbc for user to access this
information.
> If the user is provided with an access to InputStream of the result,
he
> needs to get access to the content type also.=20

The driver would access the information and so know how to parse the
incoming RDF graph.  In fact, it needs a factory interface

interface GraphFactory
{
	Object parse(InputStream, String httpContentTypeAndCharset) ;
}

then a CONSTRUCT returns a 1-row,1-col result set : getObject() is the
graph.  getCharacterStream() or getBinaryStream() would give a more
direct access if needed but (see below) I don't see these are common.

>=20
> > Could you give the use case you have in mind here? (why is it more
> > convenient to have a stream of triples?)
> >=20
> >=20
> I use frequently Model.listStatements variants - and have used in
every
> RDF based applications I've ever made using Jena or SIR ;-) I wouldn't
> like the performance penalty nor increased memory requirements of
> having to read the results first into a model just for iterating over
> them. One could also argue that every (reading) RDF operation involves
> ultimately a stream/iteration of triples. Sure there's convenience
> accesses filtering objects of the statements or select-type query
> returning bindings, but these operations in turn rely on statement
> iterations. [When building a generic program that doesn't have full
> control of all input, the select-query- access is strictly speaking
not
> usable if "told bnodes" are not supported.]

We need to go back to use cases and the role of SPARQL4j.

An important value of SPARQl4j (for me) is that an application, which is
not very RDF aware, to be able to get information out of remote RDF
repository by SPARQL query.

JDBC is not a good match to getting RDF graphs (as we are finding!) and
choosing one processing model (streams of something) over another makes
assumptions about the nature and approach of the toolkit.

If the application wants to do listStatements AND wants triples in the
local toolkit format, then the value of a JDBC interface (which is
row-oriented) is pretty slim.  So I do not see a high value for SPARQl4j
as a general connection over the SPARQL protocol in the initial
releases.

It would seem likely that every RDF toolkit will have a built-in SPARQL
client so if the application is doing RDF processing, it is much better
to use that that trying to fit around the JDBC row-oritneted paradigm.
It's pretty easy to write (that part ARQ is quite small - some rather
tedious HTTP connection bashing).

Also, given triples doesn't come back in any particular order from
CONSTRUCT then I find it hard to see many processing models that can
achieve streamability.  Maybe you could sketch your use case in a little
more detail?  It's that bit I'm puzzled by.

[[And there aren't any told bnodes in general (but ARQ you can get them
by setting the right config options :-)  Not sure who will support told
bNodes.  3Store maybe.]]

>=20
> One (internal to Jena/SIR) use case for this could be a (read-only)
> graph wrapping some sparql-enabled repository, addressing queries when
> necessary and possibly caching the results. =20
>=20
> One (possibly quite far fetched) use case is using this driver in a
> generic SQL browser allowing user to address queries and showing
> results in a tabular form. Having tabular form for RDF triples
(exposed
> vie  =20
> ResultSetMetaData) instead of clob/blob might be more convenient.
>=20
> > Toolkits have their own APIs to their parsers to generate triples -
I
> > guess most have a stream interface (ours do) but it would be more
> > normal to parse directly into the graph and return a graph (yes -
> > streaming is not possible).=20
> >=20
> >=20
> > To get a stream of triples, do the query:
> >=20
> > SELECT * { ?s ?p ?o }
> >=20
> > maybe with an ORDER BY
> >=20
> >=20
> That's certainly one alternative, but gets pretty difficult when using
> just a bit more complex templates.

The key is that it minimises the requirements on the client.  If we
assume there is a complete RDF system in the client, why force ourselves
through the JDBC paradigm when we could just as easily have a
SPARQL-protocol specific API?  The value of SPARQL4j to me is to connect
to applications that don't want a full RDF processing system but do want
to get some information out of an RDF repository.
=20
>=20
> > As a graph isn't usable until all the triples are known (a triple
can
> > turn up at any point in the stream), an application would need to do
> > the SELECT query to process results before the last is seen.
> >=20
> >=20
> A graph may not be, but triples are also usable as such.
>=20
> Also I find the stream based access to the results quite usable
> regardles of the result form - at least if it's XML and not N3 (e.g.
> XSLT). =20

That is XSLT on the XML results?

If you mean RDF/XML, the instability of the encoding is why DAWG had to
do a fixed schema XML results format.

>=20
> Perhaps we should discuss and document what kind of use cases we wan
to
> support with sparql4j?=20

Cool - good idea.

>=20
> Br,
> Samppa

	Andy

>=20
> --
> Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin
> kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358
(0)9
> 855 98 002 Mob. +358 (0)41 515 1412  Internet: http://www.profium.com

>=20
>=20
>=20
>=20
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> files=20
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD
SPLUNK!
> http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick
> _______________________________________________
> Sparql4j-devel mailing list
> Spa...@li...
> https://lists.sourceforge.net/lists/listinfo/sparql4j-devel