RE: [Sparql4j-devel] RE: XML parsing
Status: Pre-Alpha
Brought to you by:
jsaarela
From: Seaborne, A. <and...@hp...> - 2006-01-04 16:39:31
|
-------- Original Message -------- > From: Samppa Saarela <> > Date: 4 January 2006 08:26 >=20 > > > If we have a factory that can handle RDF terms, adding support for > > > triples is trivial.=20 > > >=20 > > >=20 > >=20 > > In the sense that creating a triple is just a 3-slot object, yes. But > > the factory idea means that objects specific to the local RDF tookit > > are retruned and it will have it's own idea of a triple (e..g. in > > Jena, the application API object "Statement" is not a plain 3 slot > > triple - it knows which model it comes from). > >=20 > >=20 >=20 > There's no public API for constructing RDFNodes directly either in > Jena, so that too might be a problem too. Wouldn't it at least bypass > all (per-model) node caches? =20 ResourceFactory (it creates them relative to a hidden model but all operations, if passed a resource from one model convert to a resource in the model where the operation is to be performed automatically. Resources need to know where they come from so resource.getProperty() works. Otherwsie, an application can quite easily work at the Jena Graph level. It is stable and more pure (less convenience) and triples are, well, triples (3-tuples). Which per-model caches are you referring to? (and all Jena's caches are just that - caches - things work if they get bypassed). Caching partial query results is v hard unless you can deduce one query is a sub-query os another. >=20 > > The XML Results format does not return triples - it's only CONSTRUCT > > and DESCRIBE.=20 > >=20 > >=20 > Good point. >=20 > > > I would find it more convenient to get the triples of the graph > > > returned as triples (i.e. triple per row) using the factory along > > >=20 > > >=20 > > with > >=20 > >=20 > > > speudo column accessors. This way we would (first of all) avoid > > >=20 > > >=20 > > special > >=20 > >=20 > > > content-type handling. > > >=20 > > >=20 > >=20 > > I don't follow this - the HTTP reply header has to be correctly > > parsed. Such content handling is easy. > >=20 > >=20 > But there's no standard way in jdbc for user to access this information. > If the user is provided with an access to InputStream of the result, he > needs to get access to the content type also.=20 The driver would access the information and so know how to parse the incoming RDF graph. In fact, it needs a factory interface interface GraphFactory { Object parse(InputStream, String httpContentTypeAndCharset) ; } then a CONSTRUCT returns a 1-row,1-col result set : getObject() is the graph. getCharacterStream() or getBinaryStream() would give a more direct access if needed but (see below) I don't see these are common. >=20 > > Could you give the use case you have in mind here? (why is it more > > convenient to have a stream of triples?) > >=20 > >=20 > I use frequently Model.listStatements variants - and have used in every > RDF based applications I've ever made using Jena or SIR ;-) I wouldn't > like the performance penalty nor increased memory requirements of > having to read the results first into a model just for iterating over > them. One could also argue that every (reading) RDF operation involves > ultimately a stream/iteration of triples. Sure there's convenience > accesses filtering objects of the statements or select-type query > returning bindings, but these operations in turn rely on statement > iterations. [When building a generic program that doesn't have full > control of all input, the select-query- access is strictly speaking not > usable if "told bnodes" are not supported.] We need to go back to use cases and the role of SPARQL4j. An important value of SPARQl4j (for me) is that an application, which is not very RDF aware, to be able to get information out of remote RDF repository by SPARQL query. JDBC is not a good match to getting RDF graphs (as we are finding!) and choosing one processing model (streams of something) over another makes assumptions about the nature and approach of the toolkit. If the application wants to do listStatements AND wants triples in the local toolkit format, then the value of a JDBC interface (which is row-oriented) is pretty slim. So I do not see a high value for SPARQl4j as a general connection over the SPARQL protocol in the initial releases. It would seem likely that every RDF toolkit will have a built-in SPARQL client so if the application is doing RDF processing, it is much better to use that that trying to fit around the JDBC row-oritneted paradigm. It's pretty easy to write (that part ARQ is quite small - some rather tedious HTTP connection bashing). Also, given triples doesn't come back in any particular order from CONSTRUCT then I find it hard to see many processing models that can achieve streamability. Maybe you could sketch your use case in a little more detail? It's that bit I'm puzzled by. [[And there aren't any told bnodes in general (but ARQ you can get them by setting the right config options :-) Not sure who will support told bNodes. 3Store maybe.]] >=20 > One (internal to Jena/SIR) use case for this could be a (read-only) > graph wrapping some sparql-enabled repository, addressing queries when > necessary and possibly caching the results. =20 >=20 > One (possibly quite far fetched) use case is using this driver in a > generic SQL browser allowing user to address queries and showing > results in a tabular form. Having tabular form for RDF triples (exposed > vie =20 > ResultSetMetaData) instead of clob/blob might be more convenient. >=20 > > Toolkits have their own APIs to their parsers to generate triples - I > > guess most have a stream interface (ours do) but it would be more > > normal to parse directly into the graph and return a graph (yes - > > streaming is not possible).=20 > >=20 > >=20 > > To get a stream of triples, do the query: > >=20 > > SELECT * { ?s ?p ?o } > >=20 > > maybe with an ORDER BY > >=20 > >=20 > That's certainly one alternative, but gets pretty difficult when using > just a bit more complex templates. The key is that it minimises the requirements on the client. If we assume there is a complete RDF system in the client, why force ourselves through the JDBC paradigm when we could just as easily have a SPARQL-protocol specific API? The value of SPARQL4j to me is to connect to applications that don't want a full RDF processing system but do want to get some information out of an RDF repository. =20 >=20 > > As a graph isn't usable until all the triples are known (a triple can > > turn up at any point in the stream), an application would need to do > > the SELECT query to process results before the last is seen. > >=20 > >=20 > A graph may not be, but triples are also usable as such. >=20 > Also I find the stream based access to the results quite usable > regardles of the result form - at least if it's XML and not N3 (e.g. > XSLT). =20 That is XSLT on the XML results? If you mean RDF/XML, the instability of the encoding is why DAWG had to do a fixed schema XML results format. >=20 > Perhaps we should discuss and document what kind of use cases we wan to > support with sparql4j?=20 Cool - good idea. >=20 > Br, > Samppa Andy >=20 > -- > Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin > kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 > 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com >=20 >=20 >=20 >=20 > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files=20 > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > Sparql4j-devel mailing list > Spa...@li... > https://lists.sourceforge.net/lists/listinfo/sparql4j-devel |