sparql4j-devel Mailing List for sparql4j
Status: Pre-Alpha
Brought to you by:
jsaarela
You can subscribe to this list here.
2005 |
Jan
(11) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
(11) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(13) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Seaborne, A. <and...@hp...> - 2006-01-16 11:53:25
|
-------- Original Message -------- > From: Samppa Saarela <mailto:sam...@pr...> > Date: 13 January 2006 15:18 >=20 > > > So you suggest allowing $'s also as parameter slots? I guess/hope > > > that adding support for this is trivial... > > >=20 > > >=20 > >=20 > > The reverse is what I had in mind. Writing the query with $subject. > >=20 > >=20 >=20 > It might cause confusion about what is parameter slot and what is > variable - even I'm not quite sure if you mean a parameter slot or > variable by that $subject? =20 select * where {$subject rdfs:label ?;=20 rdfs:comment $comment .=20 filter ($comment < ?) .} >=20 > The format for parameter slots should be distinct from variables, > otherwise e.g. variables used in a select-clause might get confuced as > parameter slots and become replaced... =20 >=20 > > Named paramters are much better than pure postional! > >=20 > >=20 > Absolutely! ...but there's no support for name parameters in JDBC which > is our primary objective. If named parameters are used, the format used > for *should* be distinct from variables (got better ideas than > ?{paramName} syntax?) because JDBC offers primarily (i.e. only) > positional parameters. However, we have been discussing (with Timo) > about JDBC extension allowing one to bind parameter slots into > variables, e.g. =20 >=20 > select * where {? foo:predicate ?object} >=20 > stmt.setVariable(1, "subject"); >=20 > --> select * where {?subject foo:predicate ?object} >=20 >=20 > Have a nice weekend! >=20 > Br, > Samppa |
From: Samppa S. <sam...@pr...> - 2006-01-12 15:13:33
|
Hi all, I just committed SparqlPreparedStatement implementation, it uses lone questionmarks as parameter placeholders, e.g. select * where {?subject rdfs:label ?; rdfs:comment ?comment . filter ?comment < ? .} has 2 parameter slots. It uses quite naive parsing that differs from the spec at least in IRI handling, by disallowing IRIs starting with a digit, ? or " to avoid mixing operators with IRIs. I don't see this as a grave limitation. It implements most of the PreparedStatement.set*(int, *) methods. Date/time handling is one to beaware of: for simplicity I process/normalize all time-related values using UTC-time zone. It would probably be better to use the provided Calendar or JVM default to process the time zones but this is a bit more complex to implement since XSD time (zone) format is not directly supported by the SimpleDateformat. Any opinions about this: should setDate/Time/Timestamp retain the given or implicit time zone? What more is needed for M1? System/integration tests? ResultSet handler for RDF/XML? Tests include one system/integration test that uses server URL defined in test/conf/connection.properties (note: test/conf needs to be in classpath). It just addresses (?s ?p ?o) -query, checks headers and scrolls through the results. We should discuss about how to actually implement the system/integration tests. How to ensure that the server contains some particular data set to query over and validate the results? Br Samppa -- Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com |
From: Samppa S. <sam...@pr...> - 2006-01-10 13:01:10
|
Hi all, We did some major refactorings to the project with Timo for clearer CVS usage and namings, e.g. files like logs and reports are generated into test_build instead of the same directory (i.e. test) with source files AND JDBC specific implementation classes are now prefixed with Sparql (e.g. SparqlStatement). Br, Samppa -- Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com |
From: Samppa S. <sam...@pr...> - 2006-01-10 07:53:44
|
Hi As Andy pointed out, using loggin facade for logging is a better alternative than log4j (or any other actual implementation) since that way the loggin system adapts to clients environment. Due to known problems with commons logging, I'd like to try SLF4J instead. See http://www.qos.ch/logging/classloader.jsp http://www.slf4j.org/ Opinions? Br, Samppa -- Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com |
From: Samppa S. <sam...@pr...> - 2006-01-09 09:57:04
|
>>*) triple-per-row: >> >>// JDBC-specific access exposed via ResultSetMetaData >>String subject = rs.getString("subject"); >>int subjectType = rs.getInt("subject$type"); // URI | BLANK_NODE >>String predicate = ns.getString("predicate"); >>String object = rs.getString("object"); >>int objectType = rs.getInt("object$type"); // URI | BLANK_NODE | >>PLAIN_LITERAL | TYPED_LITERAL >>String lang = rs.getString("object$lang"); >>String datatype = rs.getString("object$datatype"); >> >> > >That looks just like like > >SELECT ?subject ?predicate ?object { ... pattern ... } > > >+ the accessors for type and lang > > That's the point: that it *looks and behaves like* all other Sparql queries :-) However, it's strictly not the same (not same columns) and with a pattern containing more than one triple matcher, the result is not even close. The type, lang and datatype accessors are needed also in SELECT queries. >No, not at all. Entailment is defined on models. > >Open world says there may be other statements out there in other models. > >An application displaying the results of a query wants to know when it >has seen all the results for its query on its choosen dada source. e.g. >listing people by name makes an ordering assumption. When you see the >"K"'s, there are no more "J"'s in this result. > > You couldn't use CONSTRUCT for that anyway. >>So what's the point of sparql4j then? >> >> > >SELECT queries for application that wish to access RDF information. >For example, a RDF repository as the core of a 3-tier web site. No need >for the business logic to have an RDF toolkit if it just wants to get >people's name. > >If an application is going to do RDF processing on graphs, it would not >want to use JDBC's view of row-by-row. Why not just get the constructed >graph in whatever way its toolkit wants? Because there is a paradigm >shift, the value of JDBC to moving graphs around seems limited to me. > >But SELECT queries, to get information out of RDF graphs, are the most >important kind of query. Feed this into JDBC environments in the JDBC >pardigm and we may even be able to reuse JDBC client tools. > > So in essence sparql4j should support only SELECT (and ASK?) queries? >>Firstly there's a difference in building application specific (domain) >>model from a stream (of triples) or building first a generic RDF model >>and only then building the actual target model. >> >>Secondly, even though in general case the order of triples isn't >>guaranteed, it's quite common to group the statements by the subject. >> >> > >I disagree - you're relying on the server having processed the entire >model to make the statements come out nicely. Jena uses hashes all over >the place - things do not come out in any sort of order, nor is it >consistent. > > I know how Jena works and I make no assumptions about the nature of given toolkit or sparql engine. Triples may come out in random order and this is something the user addressing CONSTRUCT queries has to take into account. The aspect (that triples don't come out in any particular order) is build into RDF and has to be taken into account in practically every RDF-based program. I dont see this as a problem. It's just something to be aware of. >>There also isn't a way of requiring that a certain type of resources >>should be URI's... >> >>I used to build (when using Jena) RDQL queries prorammatically using >> >> >the > > >>query API directly since that way one can (or at least could) also use >>the bnodes in queries, thus avoiding failing queries in case bnodes >> >> >were > > >>used. Of course I was told that I shouldn't do this and to use >>models/resources accessors instead... however when using RDB model >> >> >with > > >>fast path, I was able to achieve magnitudes of better performance this >>way. If I recall right I even used (at least at some point) the >>ResultBinding#getTriples() to process the results. >> >> > >But it would not have worked with Joseki :-) bNodes not preserved across >the network. > > So, I coudn't have used Joseki then ;-) bNodes are used a lot in e.g. OWL ontologies and for an ontology browser addressing {this rdfs:subClassOf ?superclass} running into a bNode (e.g. Restriction) needs to be able to get more information about it. >You might like to see the ARQ configuration options I just put in so >bnode ids are passed transparently across the network. It's not the >default mode (makes the XML results look ugly). > > RDF/XML itself looks ugly and is not ment for human consumption anyway ;-) >>>That is XSLT on the XML results? >>> >>> >>> >>> >>Yes. >> >> > >So it's a SELECT query. > > No. One may use XSLT also with RDF/XML. It might not be very convenient, but hardly impossible and most importantly a fully valid use case. I actually would have preferred RDF/XML -based format for SELECT queries too - preferably a canonical RDF/XML format - to be able to process the results reliably with an RDF toolkit. Having canonical RDF/XM format for CONSTRUCT queries would certainly help XSLT use cases also... >>All the more reason why we should provide also RDF-parsing :-) >>Succeeding to do this in a toolkit independent way might also be >> >> >usable > > >>to anyone building toolkits... >> >> > >Then the objective of the project is now providing another API to RDF >(wrapping all toolkits is just like a new toolkit that uses others as >its implementation. I have a toolkit, and a SPARQL-protocol interface >that (I think) is easiler to work with than warping to the JDBC paradign >and and waroping back again - it's cognitive bruden on the app writer. >I wanted an approach of doing something clearcut, minimal and distinct >with a clear value. But this now seems to be growing into a general >purpose RDF framework. > No. Just a meaningfull JDBC binding for all types of valid Sparql queries. Coding reusable components is a good practice as such. >Can we find some limits please? > > Should this (triple-per-row) be desired approach, it can be scheduled for later and throw "not implemented yet, use Statement.executeSparql() instead" -exception for now. We both seem to have pretty firm opininon about how to handle CONSTRUCT queries. Since these to approaches are complementary I'd suggest voting on it. Use cases might also help in making the decision, but I guess we'd just end up arguing about the relevance of them ;-) Br Samppa -- Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com |
From: Seaborne, A. <and...@hp...> - 2006-01-06 16:34:05
|
-------- Original Message -------- > From: Samppa Saarela <> > Date: 5 January 2006 09:50 >=20 . . . . > > An important value of SPARQl4j (for me) is that an application, which > > is=20 > > not very RDF aware, to be able to get information out of remote RDF > > repository by SPARQL query. > >=20 > >=20 > However, relying on external RDF API (or InputStream) to handle certain > kind of queries makes it highly RDF dependent and thus requires > ultimatly that the user is not only aware of RDF but also aware of the > (configuration dependent) RDF toolkit. >=20 > The design I'm proposing aimes >=20 > 1) at providing as-natural-as-it-gets jdbc (row/column -based) behaviour > in all cases >=20 > 2) not blocking more RDF/Sparql aware use cases / applications and >=20 > 3) making the decision between the desired approach explicit (i.e. an > application that is aware of the result forms may access the results > directly by being aware of the Sparql-specific jdbc api extensions), >=20 > 4) providing the user all the necessary information and control (e.g. > ability to define accept preferences when executing a query and access > to the actual content type of the result) needed to process the results > directly, >=20 > 5) providing factory-based / RDF toolkit dependent getObject() accessors > (only) as convenience accessors for RDF aware application >=20 > > JDBC is not a good match to getting RDF graphs (as we are finding!) > > and=20 > > choosing one processing model (streams of something) over another > > makes=20 > > assumptions about the nature and approach of the toolkit. > >=20 > >=20 > That's why I'd try to avoid any direct/required dependency to the > factory, providing it only (and possibly optionally) for convenience to > more RDF aware users. In my opinion the triple-per-row* is > as-good-as-it-gets alternative for row based (i.e. jdbc style) handling > of RDF. It actually resembles one of RDF's serialization forms, namely > NTRIPLES. Also the W3C's RDF Validator provides tabular form of the > parsed graph, which I have found quite usefull. >=20 > *) triple-per-row: >=20 > // JDBC-specific access exposed via ResultSetMetaData > String subject =3D rs.getString("subject"); > int subjectType =3D rs.getInt("subject$type"); // URI | BLANK_NODE > String predicate =3D ns.getString("predicate"); > String object =3D rs.getString("object"); > int objectType =3D rs.getInt("object$type"); // URI | BLANK_NODE | > PLAIN_LITERAL | TYPED_LITERAL > String lang =3D rs.getString("object$lang"); > String datatype =3D rs.getString("object$datatype"); That looks just like like SELECT ?subject ?predicate ?object { ... pattern ... } + the accessors for type and lang >=20 > // RDF toolkit specific convenience/hidden access > Resource subject =3D (Resource)rs.getObject("subject"); > Predicate predicate =3D (Predicate)rs.getObject("predicate"); > RDFNode object =3D (RDFNode)rs.getObject("object"); >=20 > Note that the JDBC-specific access can easily be used to provide a > configuration (i.e. factory) independent access to any RDF toolkit > specific objects. It's also by far more robust way of accessing the > toolkit specific resources than the factory approach which would end up > in ClassCastExceptions if configuration changed. Actually the factory > approach could aesily be replaced with simple (and robust) ResultSet > wrappers / handlers. >=20 > > If the application wants to do listStatements AND wants triples in the > > local toolkit format, then the value of a JDBC interface (which is > > row-oriented) is pretty slim. So I do not see a high value for > > SPARQl4j=20 > > as a general connection over the SPARQL protocol in the initial > > releases. > >=20 > >=20 > Triple-per-row on the other hand offers even something usefull for a not > so RDF/Sparql/toolkit aware application. >=20 > The value of providing only sparql result form parsing is also pretty > slim - I have actually already implemented it (just not yet committed it > into CVS). As parsing RDF/XML (and/or N3) is much more difficult, > providing just that in a toolkit independent way would actually provide > extra value. >=20 > BTW Isn't it a bit contrary to the open-world view of Semantic Web, when > you argued in your previous email that a model isn't usable unless all > of it's statements are known? No, not at all. Entailment is defined on models. Open world says there may be other statements out there in other models. An application displaying the results of a query wants to know when it has seen all the results for its query on its choosen dada source. e.g. listing people by name makes an ordering assumption. When you see the "K"'s, there are no more "J"'s in this result. >=20 > > It would seem likely that every RDF toolkit will have a built-in > > SPARQL=20 > > client so if the application is doing RDF processing, it is much > > better=20 > > to use that that trying to fit around the JDBC row-oritneted paradigm. > > It's pretty easy to write (that part ARQ is quite small - some rather > > tedious HTTP connection bashing). > >=20 > >=20 > So what's the point of sparql4j then? SELECT queries for application that wish to access RDF information. For example, a RDF repository as the core of a 3-tier web site. No need for the business logic to have an RDF toolkit if it just wants to get people's name. If an application is going to do RDF processing on graphs, it would not want to use JDBC's view of row-by-row. Why not just get the constructed graph in whatever way its toolkit wants? Because there is a paradigm shift, the value of JDBC to moving graphs around seems limited to me. But SELECT queries, to get information out of RDF graphs, are the most important kind of query. Feed this into JDBC environments in the JDBC pardigm and we may even be able to reuse JDBC client tools. >=20 > > Also, given triples doesn't come back in any particular order from > > CONSTRUCT then I find it hard to see many processing models that can > > achieve streamability. Maybe you could sketch your use case in a > > little more detail? It's that bit I'm puzzled by.=20 > >=20 > >=20 > Firstly there's a difference in building application specific (domain) > model from a stream (of triples) or building first a generic RDF model > and only then building the actual target model. >=20 > Secondly, even though in general case the order of triples isn't > guaranteed, it's quite common to group the statements by the subject. I disagree - you're relying on the server having processed the entire model to make the statements come out nicely. Jena uses hashes all over the place - things do not come out in any sort of order, nor is it consistent. > In > case the contstruct matches are streamed directly, one could assume that > triples of a single template match would be some how grouped. Hardly in > any case the order of returned triples is fully random. >=20 > The simplest and most obvious use case is to visualize the triples > returned in a tabular form directly. Many GUI/WUI table widgets provide > sorting of the rows by columns. >=20 > > [[And there aren't any told bnodes in general (but ARQ you can get > > them=20 > > by setting the right config options :-) Not sure who will support > > told=20 > > bNodes. 3Store maybe.]] > >=20 > >=20 > There also isn't a way of requiring that a certain type of resources > should be URI's... >=20 > I used to build (when using Jena) RDQL queries prorammatically using the > query API directly since that way one can (or at least could) also use > the bnodes in queries, thus avoiding failing queries in case bnodes were > used. Of course I was told that I shouldn't do this and to use > models/resources accessors instead... however when using RDB model with > fast path, I was able to achieve magnitudes of better performance this > way. If I recall right I even used (at least at some point) the > ResultBinding#getTriples() to process the results. But it would not have worked with Joseki :-) bNodes not preserved across the network. You might like to see the ARQ configuration options I just put in so bnode ids are passed transparently across the network. It's not the default mode (makes the XML results look ugly). >=20 > > The key is that it minimises the requirements on the client. If we > > assume there is a complete RDF system in the client, why force > > ourselves=20 > > through the JDBC paradigm when we could just as easily have a > > SPARQL-protocol specific API? The value of SPARQL4j to me is to > > connect=20 > > to applications that don't want a full RDF processing system but do > > want=20 > > to get some information out of an RDF repository. > >=20 > >=20 > Exactly(!) and in my opinion this should apply also to the > CONSTRUCT/DESCRIBE queries. Such an application could do hardly anything > with a byte stream of RDF/XML not to mention N3. >=20 > > > A graph may not be, but triples are also usable as such. > > >=20 > > > Also I find the stream based access to the results quite usable > > > regardles of the result form - at least if it's XML and not N3 > > > (e.g. XSLT).=20 > > >=20 > > >=20 > >=20 > > That is XSLT on the XML results? > >=20 > >=20 > Yes. So it's a SELECT query. >=20 > > If you mean RDF/XML, the instability of the encoding is why DAWG had > > to=20 > > do a fixed schema XML results format. > >=20 > >=20 > All the more reason why we should provide also RDF-parsing :-) > Succeeding to do this in a toolkit independent way might also be usable > to anyone building toolkits... Then the objective of the project is now providing another API to RDF (wrapping all toolkits is just like a new toolkit that uses others as its implementation. I have a toolkit, and a SPARQL-protocol interface that (I think) is easiler to work with than warping to the JDBC paradign and and waroping back again - it's cognitive bruden on the app writer. I wanted an approach of doing something clearcut, minimal and distinct with a clear value. But this now seems to be growing into a general purpose RDF framework. Can we find some limits please? Andy >=20 > > > Perhaps we should discuss and document what kind of use cases we wan > > >=20 > > >=20 > > to > >=20 > >=20 > > > support with sparql4j? > > >=20 > > >=20 > >=20 > > Cool - good idea. > >=20 > >=20 > Let's start a separate thread for this and copy-paste results into the > document :-) >=20 > -Samppa >=20 > -- > Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin > kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 > 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com |
From: Samppa S. <sam...@pr...> - 2006-01-05 09:52:03
|
>ResourceFactory (it creates them relative to a hidden model but all >operations, if passed a resource from one model convert to a resource in >the model where the operation is to be performed automatically. >Resources need to know where they come from so > resource.getProperty() >works. > >Otherwsie, an application can quite easily work at the Jena Graph level. >It is stable and more pure (less convenience) and triples are, well, >triples (3-tuples). > >Which per-model caches are you referring to? (and all Jena's caches are >just that - caches - things work if they get bypassed). > > EnhGraph.enhNodes... Things work, but using cache produces less gargage and creating resources directly to the target model is better than having multiple caches (one internal to the factory and one to the target model). Anyway these are very subtle differences and the important thing is that things work. >Caching partial query results is v hard unless you can deduce one query >is a sub-query os another. > > True - and that's not what I ment. What I was really questioning that is it really enough to have the factory defined at driver -level, or do we need the ability to set/override the factory at statement level AND that what's the best alternative for this depends on the toolkit used. >>But there's no standard way in jdbc for user to access this >> >> >information. > > >>If the user is provided with an access to InputStream of the result, >> >> >he > > >>needs to get access to the content type also. >> >> > >The driver would access the information and so know how to parse the >incoming RDF graph. In fact, it needs a factory interface > >interface GraphFactory >{ > Object parse(InputStream, String httpContentTypeAndCharset) ; >} > >then a CONSTRUCT returns a 1-row,1-col result set : getObject() is the >graph. getCharacterStream() or getBinaryStream() would give a more >direct access if needed but (see below) I don't see these are common. > > I see. getCharacterStream and getBinaryStream are definitely better alternatives than clob/blob. This approach may have the drawback that the sequence at which column-accessor-methods are called makes the result set behave differently, e.g. if getBinaryStream(1) is called first, getObject(1) is no longer available and vice versa (i.e. unless the binary stream is cached). Also this kind of dual type -column cannot be defined via ResultSetMetaData, can it? If this same approach would be applied select/ask results (i.e. accessing getBinaryStream(1) of the first row would return whole result as a stream, instead of rows) the result would be even more confusing. >>>Could you give the use case you have in mind here? (why is it more >>>convenient to have a stream of triples?) >>> >>> >>> >>> >>I use frequently Model.listStatements variants - and have used in >> >> >every > > >>RDF based applications I've ever made using Jena or SIR ;-) I wouldn't >>like the performance penalty nor increased memory requirements of >>having to read the results first into a model just for iterating over >>them. One could also argue that every (reading) RDF operation involves >>ultimately a stream/iteration of triples. Sure there's convenience >>accesses filtering objects of the statements or select-type query >>returning bindings, but these operations in turn rely on statement >>iterations. [When building a generic program that doesn't have full >>control of all input, the select-query- access is strictly speaking >> >> >not > > >>usable if "told bnodes" are not supported.] >> >> > >We need to go back to use cases and the role of SPARQL4j. > > > Yes :-) >An important value of SPARQl4j (for me) is that an application, which is >not very RDF aware, to be able to get information out of remote RDF >repository by SPARQL query. > > However, relying on external RDF API (or InputStream) to handle certain kind of queries makes it highly RDF dependent and thus requires ultimatly that the user is not only aware of RDF but also aware of the (configuration dependent) RDF toolkit. The design I'm proposing aimes 1) at providing as-natural-as-it-gets jdbc (row/column -based) behaviour in all cases 2) not blocking more RDF/Sparql aware use cases / applications and 3) making the decision between the desired approach explicit (i.e. an application that is aware of the result forms may access the results directly by being aware of the Sparql-specific jdbc api extensions), 4) providing the user all the necessary information and control (e.g. ability to define accept preferences when executing a query and access to the actual content type of the result) needed to process the results directly, 5) providing factory-based / RDF toolkit dependent getObject() accessors (only) as convenience accessors for RDF aware application >JDBC is not a good match to getting RDF graphs (as we are finding!) and >choosing one processing model (streams of something) over another makes >assumptions about the nature and approach of the toolkit. > > That's why I'd try to avoid any direct/required dependency to the factory, providing it only (and possibly optionally) for convenience to more RDF aware users. In my opinion the triple-per-row* is as-good-as-it-gets alternative for row based (i.e. jdbc style) handling of RDF. It actually resembles one of RDF's serialization forms, namely NTRIPLES. Also the W3C's RDF Validator provides tabular form of the parsed graph, which I have found quite usefull. *) triple-per-row: // JDBC-specific access exposed via ResultSetMetaData String subject = rs.getString("subject"); int subjectType = rs.getInt("subject$type"); // URI | BLANK_NODE String predicate = ns.getString("predicate"); String object = rs.getString("object"); int objectType = rs.getInt("object$type"); // URI | BLANK_NODE | PLAIN_LITERAL | TYPED_LITERAL String lang = rs.getString("object$lang"); String datatype = rs.getString("object$datatype"); // RDF toolkit specific convenience/hidden access Resource subject = (Resource)rs.getObject("subject"); Predicate predicate = (Predicate)rs.getObject("predicate"); RDFNode object = (RDFNode)rs.getObject("object"); Note that the JDBC-specific access can easily be used to provide a configuration (i.e. factory) independent access to any RDF toolkit specific objects. It's also by far more robust way of accessing the toolkit specific resources than the factory approach which would end up in ClassCastExceptions if configuration changed. Actually the factory approach could aesily be replaced with simple (and robust) ResultSet wrappers / handlers. >If the application wants to do listStatements AND wants triples in the >local toolkit format, then the value of a JDBC interface (which is >row-oriented) is pretty slim. So I do not see a high value for SPARQl4j >as a general connection over the SPARQL protocol in the initial >releases. > > Triple-per-row on the other hand offers even something usefull for a not so RDF/Sparql/toolkit aware application. The value of providing only sparql result form parsing is also pretty slim - I have actually already implemented it (just not yet committed it into CVS). As parsing RDF/XML (and/or N3) is much more difficult, providing just that in a toolkit independent way would actually provide extra value. BTW Isn't it a bit contrary to the open-world view of Semantic Web, when you argued in your previous email that a model isn't usable unless all of it's statements are known? >It would seem likely that every RDF toolkit will have a built-in SPARQL >client so if the application is doing RDF processing, it is much better >to use that that trying to fit around the JDBC row-oritneted paradigm. >It's pretty easy to write (that part ARQ is quite small - some rather >tedious HTTP connection bashing). > > So what's the point of sparql4j then? >Also, given triples doesn't come back in any particular order from >CONSTRUCT then I find it hard to see many processing models that can >achieve streamability. Maybe you could sketch your use case in a little more detail? It's that bit I'm puzzled by. > > Firstly there's a difference in building application specific (domain) model from a stream (of triples) or building first a generic RDF model and only then building the actual target model. Secondly, even though in general case the order of triples isn't guaranteed, it's quite common to group the statements by the subject. In case the contstruct matches are streamed directly, one could assume that triples of a single template match would be some how grouped. Hardly in any case the order of returned triples is fully random. The simplest and most obvious use case is to visualize the triples returned in a tabular form directly. Many GUI/WUI table widgets provide sorting of the rows by columns. >[[And there aren't any told bnodes in general (but ARQ you can get them >by setting the right config options :-) Not sure who will support told >bNodes. 3Store maybe.]] > > There also isn't a way of requiring that a certain type of resources should be URI's... I used to build (when using Jena) RDQL queries prorammatically using the query API directly since that way one can (or at least could) also use the bnodes in queries, thus avoiding failing queries in case bnodes were used. Of course I was told that I shouldn't do this and to use models/resources accessors instead... however when using RDB model with fast path, I was able to achieve magnitudes of better performance this way. If I recall right I even used (at least at some point) the ResultBinding#getTriples() to process the results. >The key is that it minimises the requirements on the client. If we >assume there is a complete RDF system in the client, why force ourselves >through the JDBC paradigm when we could just as easily have a >SPARQL-protocol specific API? The value of SPARQL4j to me is to connect >to applications that don't want a full RDF processing system but do want >to get some information out of an RDF repository. > > Exactly(!) and in my opinion this should apply also to the CONSTRUCT/DESCRIBE queries. Such an application could do hardly anything with a byte stream of RDF/XML not to mention N3. >>A graph may not be, but triples are also usable as such. >> >>Also I find the stream based access to the results quite usable >>regardles of the result form - at least if it's XML and not N3 (e.g. >>XSLT). >> >> > >That is XSLT on the XML results? > > Yes. >If you mean RDF/XML, the instability of the encoding is why DAWG had to >do a fixed schema XML results format. > > All the more reason why we should provide also RDF-parsing :-) Succeeding to do this in a toolkit independent way might also be usable to anyone building toolkits... >>Perhaps we should discuss and document what kind of use cases we wan >> >> >to > > >>support with sparql4j? >> >> > >Cool - good idea. > > Let's start a separate thread for this and copy-paste results into the document :-) -Samppa -- Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com |
From: Seaborne, A. <and...@hp...> - 2006-01-04 16:39:31
|
-------- Original Message -------- > From: Samppa Saarela <> > Date: 4 January 2006 08:26 >=20 > > > If we have a factory that can handle RDF terms, adding support for > > > triples is trivial.=20 > > >=20 > > >=20 > >=20 > > In the sense that creating a triple is just a 3-slot object, yes. But > > the factory idea means that objects specific to the local RDF tookit > > are retruned and it will have it's own idea of a triple (e..g. in > > Jena, the application API object "Statement" is not a plain 3 slot > > triple - it knows which model it comes from). > >=20 > >=20 >=20 > There's no public API for constructing RDFNodes directly either in > Jena, so that too might be a problem too. Wouldn't it at least bypass > all (per-model) node caches? =20 ResourceFactory (it creates them relative to a hidden model but all operations, if passed a resource from one model convert to a resource in the model where the operation is to be performed automatically. Resources need to know where they come from so resource.getProperty() works. Otherwsie, an application can quite easily work at the Jena Graph level. It is stable and more pure (less convenience) and triples are, well, triples (3-tuples). Which per-model caches are you referring to? (and all Jena's caches are just that - caches - things work if they get bypassed). Caching partial query results is v hard unless you can deduce one query is a sub-query os another. >=20 > > The XML Results format does not return triples - it's only CONSTRUCT > > and DESCRIBE.=20 > >=20 > >=20 > Good point. >=20 > > > I would find it more convenient to get the triples of the graph > > > returned as triples (i.e. triple per row) using the factory along > > >=20 > > >=20 > > with > >=20 > >=20 > > > speudo column accessors. This way we would (first of all) avoid > > >=20 > > >=20 > > special > >=20 > >=20 > > > content-type handling. > > >=20 > > >=20 > >=20 > > I don't follow this - the HTTP reply header has to be correctly > > parsed. Such content handling is easy. > >=20 > >=20 > But there's no standard way in jdbc for user to access this information. > If the user is provided with an access to InputStream of the result, he > needs to get access to the content type also.=20 The driver would access the information and so know how to parse the incoming RDF graph. In fact, it needs a factory interface interface GraphFactory { Object parse(InputStream, String httpContentTypeAndCharset) ; } then a CONSTRUCT returns a 1-row,1-col result set : getObject() is the graph. getCharacterStream() or getBinaryStream() would give a more direct access if needed but (see below) I don't see these are common. >=20 > > Could you give the use case you have in mind here? (why is it more > > convenient to have a stream of triples?) > >=20 > >=20 > I use frequently Model.listStatements variants - and have used in every > RDF based applications I've ever made using Jena or SIR ;-) I wouldn't > like the performance penalty nor increased memory requirements of > having to read the results first into a model just for iterating over > them. One could also argue that every (reading) RDF operation involves > ultimately a stream/iteration of triples. Sure there's convenience > accesses filtering objects of the statements or select-type query > returning bindings, but these operations in turn rely on statement > iterations. [When building a generic program that doesn't have full > control of all input, the select-query- access is strictly speaking not > usable if "told bnodes" are not supported.] We need to go back to use cases and the role of SPARQL4j. An important value of SPARQl4j (for me) is that an application, which is not very RDF aware, to be able to get information out of remote RDF repository by SPARQL query. JDBC is not a good match to getting RDF graphs (as we are finding!) and choosing one processing model (streams of something) over another makes assumptions about the nature and approach of the toolkit. If the application wants to do listStatements AND wants triples in the local toolkit format, then the value of a JDBC interface (which is row-oriented) is pretty slim. So I do not see a high value for SPARQl4j as a general connection over the SPARQL protocol in the initial releases. It would seem likely that every RDF toolkit will have a built-in SPARQL client so if the application is doing RDF processing, it is much better to use that that trying to fit around the JDBC row-oritneted paradigm. It's pretty easy to write (that part ARQ is quite small - some rather tedious HTTP connection bashing). Also, given triples doesn't come back in any particular order from CONSTRUCT then I find it hard to see many processing models that can achieve streamability. Maybe you could sketch your use case in a little more detail? It's that bit I'm puzzled by. [[And there aren't any told bnodes in general (but ARQ you can get them by setting the right config options :-) Not sure who will support told bNodes. 3Store maybe.]] >=20 > One (internal to Jena/SIR) use case for this could be a (read-only) > graph wrapping some sparql-enabled repository, addressing queries when > necessary and possibly caching the results. =20 >=20 > One (possibly quite far fetched) use case is using this driver in a > generic SQL browser allowing user to address queries and showing > results in a tabular form. Having tabular form for RDF triples (exposed > vie =20 > ResultSetMetaData) instead of clob/blob might be more convenient. >=20 > > Toolkits have their own APIs to their parsers to generate triples - I > > guess most have a stream interface (ours do) but it would be more > > normal to parse directly into the graph and return a graph (yes - > > streaming is not possible).=20 > >=20 > >=20 > > To get a stream of triples, do the query: > >=20 > > SELECT * { ?s ?p ?o } > >=20 > > maybe with an ORDER BY > >=20 > >=20 > That's certainly one alternative, but gets pretty difficult when using > just a bit more complex templates. The key is that it minimises the requirements on the client. If we assume there is a complete RDF system in the client, why force ourselves through the JDBC paradigm when we could just as easily have a SPARQL-protocol specific API? The value of SPARQL4j to me is to connect to applications that don't want a full RDF processing system but do want to get some information out of an RDF repository. =20 >=20 > > As a graph isn't usable until all the triples are known (a triple can > > turn up at any point in the stream), an application would need to do > > the SELECT query to process results before the last is seen. > >=20 > >=20 > A graph may not be, but triples are also usable as such. >=20 > Also I find the stream based access to the results quite usable > regardles of the result form - at least if it's XML and not N3 (e.g. > XSLT). =20 That is XSLT on the XML results? If you mean RDF/XML, the instability of the encoding is why DAWG had to do a fixed schema XML results format. >=20 > Perhaps we should discuss and document what kind of use cases we wan to > support with sparql4j?=20 Cool - good idea. >=20 > Br, > Samppa Andy >=20 > -- > Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin > kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 > 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com >=20 >=20 >=20 >=20 > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files=20 > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > Sparql4j-devel mailing list > Spa...@li... > https://lists.sourceforge.net/lists/listinfo/sparql4j-devel |
From: Samppa S. <sam...@pr...> - 2006-01-04 08:28:08
|
>>If we have a factory that can handle RDF terms, adding support for >>triples is trivial. >> >> > >In the sense that creating a triple is just a 3-slot object, yes. But >the factory idea means that objects specific to the local RDF tookit are >retruned and it will have it's own idea of a triple (e..g. in Jena, the >application API object "Statement" is not a plain 3 slot triple - it >knows which model it comes from). > > There's no public API for constructing RDFNodes directly either in Jena, so that too might be a problem too. Wouldn't it at least bypass all (per-model) node caches? >The XML Results format does not return triples - it's only CONSTRUCT and >DESCRIBE. > > Good point. >>I would find it more convenient to get the triples of the graph >>returned as triples (i.e. triple per row) using the factory along >> >> >with > > >>speudo column accessors. This way we would (first of all) avoid >> >> >special > > >>content-type handling. >> >> > >I don't follow this - the HTTP reply header has to be correctly parsed. >Such content handling is easy. > > But there's no standard way in jdbc for user to access this information. If the user is provided with an access to InputStream of the result, he needs to get access to the content type also. >Could you give the use case you have in mind here? (why is it more >convenient to have a stream of triples?) > > I use frequently Model.listStatements variants - and have used in every RDF based applications I've ever made using Jena or SIR ;-) I wouldn't like the performance penalty nor increased memory requirements of having to read the results first into a model just for iterating over them. One could also argue that every (reading) RDF operation involves ultimately a stream/iteration of triples. Sure there's convenience accesses filtering objects of the statements or select-type query returning bindings, but these operations in turn rely on statement iterations. [When building a generic program that doesn't have full control of all input, the select-query- access is strictly speaking not usable if "told bnodes" are not supported.] One (internal to Jena/SIR) use case for this could be a (read-only) graph wrapping some sparql-enabled repository, addressing queries when necessary and possibly caching the results. One (possibly quite far fetched) use case is using this driver in a generic SQL browser allowing user to address queries and showing results in a tabular form. Having tabular form for RDF triples (exposed vie ResultSetMetaData) instead of clob/blob might be more convenient. >Toolkits have their own APIs to their parsers to generate triples - I >guess most have a stream interface (ours do) but it would be more normal >to parse directly into the graph >and return a graph (yes - streaming is not possible). > > >To get a stream of triples, do the query: > >SELECT * { ?s ?p ?o } > >maybe with an ORDER BY > > That's certainly one alternative, but gets pretty difficult when using just a bit more complex templates. >As a graph isn't usable until all the triples are known (a triple can >turn up at any point in the stream), an application would need to do the >SELECT query to process results before the last is seen. > > A graph may not be, but triples are also usable as such. Also I find the stream based access to the results quite usable regardles of the result form - at least if it's XML and not N3 (e.g. XSLT). Perhaps we should discuss and document what kind of use cases we wan to support with sparql4j? Br, Samppa -- Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com |
From: Seaborne, A. <and...@hp...> - 2006-01-03 14:05:15
|
-------- Original Message -------- > From: Samppa Saarela <> > Date: 29 December 2005 08:40 >=20 > > There is a mismatch between the JDBC paradigm and the SAX paradigm. > > JDBC is purely driven by the client application and there are no > > mandator call backs. SAX is driven by the rate of arrival as given > > by the parser so it migh have to accumulate results until the client > > is ready.=20 > >=20 > > There could be a bounded pipe between application and SAX code but I > > found it simpler to use a StAX parser (Woodstox) because the whole > > results-consuming process is then determined by the application. It > > is as easy to write StAX code as to write SAX > code. >=20 > StAX seems like a good choice. >=20 > > For handling SELECT queries, we don't need a full API. We need to be > > > able to handle RDF terms, but not triples. >=20 > If we have a factory that can handle RDF terms, adding support for > triples is trivial.=20 In the sense that creating a triple is just a 3-slot object, yes. But the factory idea means that objects specific to the local RDF tookit are retruned and it will have it's own idea of a triple (e..g. in Jena, the application API object "Statement" is not a plain 3 slot triple - it knows which model it comes from). The XML Results format does not return triples - it's only CONSTRUCT and DESCRIBE. >=20 > > For CONSTRUCT, etc, it might be better to properly link the result to > > > the local RDf toolkit of choice (e.g. via an InputStream). c.f. > > > SQL Blobs/Clobs.=20 >=20 >=20 > I would find it more convenient to get the triples of the graph > returned as triples (i.e. triple per row) using the factory along with > speudo column accessors. This way we would (first of all) avoid special > content-type handling. I don't follow this - the HTTP reply header has to be correctly parsed. Such content handling is easy. Could you give the use case you have in mind here? (why is it more convenient to have a stream of triples?) > When using the InputStream-approach the user > should be in control of the requested content-type. However, since > InputStreams are more convenient in some situations (e.g. when using > XSLT to process the results) maybe the best alternative would be to > provice both... and since at least some of the use cases for > InputStream access to the results are the same regardless of the type > of the query, I see no reason to limit the usage of InputStream results > only to construct and describe. To support these use cases we could > overload Statement's execute with one that returns an InputStream. E.g. >=20 > Connection c =3D datasource.getConnection(); >=20 > Statement select =3D c.createStatement(); > InputStream results =3D select.executeRaw("<any sparql query>"); >=20 >=20 > The preferred RDF serialization could be provided to the statement via > select.setRdfLang("N3") similarily to other hints (e.g. fetch size, > escape processing...).=20 >=20 >=20 > Br, > Samppa Toolkits have their own APIs to their parsers to generate triples - I guess most have a stream interface (ours do) but it would be more normal to parse directly into the graph=20 and return a graph (yes - streaming is not possible). =20 To get a stream of triples, do the query: SELECT * { ?s ?p ?o } maybe with an ORDER BY=20 As a graph isn't usable until all the triples are known (a triple can turn up at any point in the stream), an application would need to do the SELECT query to process results before the last is seen. Andy >=20 >=20 >=20 > -- > Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin > kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 > 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com=20 >=20 |
From: Seaborne, A. <and...@hp...> - 2006-01-03 13:49:47
|
Hi Timo, -------- Original Message -------- > From: Timo Westkamper <> > Date: 22 December 2005 15:00 >=20 > Hello. >=20 > Concerning SELECT results access it would be possible to plug RDF apis > into the driver via a factory interface.=20 >=20 > We could provide a neutral SPARQL4J dependent minimal RDF API and in > addition to this the possibility to plug external APIs into the driver.=20 Sounds like a good idea. Overall the interface looks OK - a few minor comments. >=20 > RDF nodes (or any other objects returned by the factory) could be > accessed via the umtyped accessors of the JDBC result set.=20 >=20 > e.g. >=20 > interface NodeFactory{ >=20 > public Object createLiteral(String lex, String lang, String datatype); Minor: as lang and datatype can't both be set: createTypedLiteral(String lex, String datatype); createPlainLiteral(String lex, String lang); is possible. As an internal factory API, it isn't significant - you're one-call version is fine. >=20 > public Object createResource(String uri); >=20 > public Object createResource(String ns, String local); Not sure about this one. Namespaces are purely syntactic in RDF and the XML Result format does not provide them. Did you have a specific usage in mind? Wouldn't only an RDF/XML format results need this and then it is a matter of a ful RDF parser anyway? >=20 > public Object createBlankNode(String internalID); >=20 > } One observation - RDF resources covers all graph labels (URI, blank nodes and literals). There isn't an official term for a URI labeled node or arc so your naming is probably as good as it gets. Andy >=20 > The name of the implementation class could be given to the Driver via > the property map.=20 >=20 > Br, > Timo. >=20 |
From: Samppa S. <sam...@pr...> - 2006-01-03 10:50:19
|
Problem: How to provide both row and InputStream based access to all types of query results. Row based access fits better to jdbc semantics and should thus be emphasized. However InputStream based access is usefull for example for XSLT post processing and for using external RDF parser to process CONSTRUCT and DESCRIBE results. Constraints: 1) Branching should be clear and consistent regardless of the type of the query, e.g. not dependent execution order, independent of query type (for ad hoc query processing) 2) JDBC semantics should not be rewritten. Future support for PreparedStatements, updates and stored procedures should not be blocked. 3) User should be in control of accept -preferences, i.e. whether n3 or rdf/xml is preferred for RDF-based results. 4) User should be able to access the content type of the results (when using InputStream access) as it is ultimately the server which decides the type of result (accept's are only preferences and may include both n3 and rdf/xml). We suggest that triple-per-row should be used as the basic approach for DESCRIBE and CONSTRUCT (i.e. RDF form) results, with subject, predicate and object as the basic columns (exposed via ResultSetMetaData in addition to pseudo-columns for subject and object type as well as literal-object's additional properties). Here are a couple of alternatives for InputStream based access to results. /* Target method for input stream processing */ public void processStream(InputStream stream, String contentType) {...} 1) SparqlStatement public interface SparqlStatement extends Statement { ... public ResultStream executeSparql(String query, String[] accept) {...} ... } public class ResultStream extends InputStream { public String getContentType() {...} public String getEncoding() {...} } Usage: SparqlStatement stmt = (SparqlStatement) connection.createStatement(); ResultStream stream = stmt.executeSparql("CONSTRUCT ...", new String[]{"text/rdf+n3"}); String contentType = stream.getContentType(); processStream(stream, contentType); Pros/cons: + Provides SPARQL specific extension to JDBC while preserving JDBC semantics + Clear way of providing accept preferences per query + Direct InputStream access independent of ResultSet and clob/blob access - Needs explicit casting - Is dependent on sparql4j API (for the SparqlStatement and ResultStream interfaces) 2) Specifying result set type when creating statement Statement stmt = connection.createStatement(SparqlResultSet.TYPE_STREAM, ResultSet.CONCUR_READ_ONLY); ResultSet rs = stmt.execute("CONSTRUCT ..."); if (rs.hasNext()) { Blob blob = rs.getBlob(1); String contentType = rs.getString(2); processStream(blob.getBinaryStream(), contentType); } Pros/cons: + Doesn't need explicit casting + No dependency on Sparql4J API except for the special result set type constant - Requires buffering of the whole result (length of the result might not be available and there's no limitations on how many times getBinaryStream can be called). - Clob/blob is under spceficied (e.g. relation to transactions) and implementations/usage are vendor specific. Br, Samppa & Timo -- Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com |
From: Samppa S. <sam...@pr...> - 2006-01-02 13:24:07
|
Hi all, What is the target java -version of sparql4j? And what about source and .class files compability? My Eclipse is set to use 1.4 with default compliance settings (source=1.3 and .class files=1.2). Any reason why we should use 1.5? I don't see any. What about 1.4 and asserts? Br, Samppa -- Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com |
From: Samppa S. <sam...@pr...> - 2005-12-29 08:42:26
|
> There is a mismatch between the JDBC paradigm and the SAX paradigm. > JDBC is purely driven > by the client application and there are no mandator call backs. SAX > is driven by the rate > of arrival as given by the parser so it migh have to accumulate > results until the client is > ready. > > There could be a bounded pipe between application and SAX code but I > found it simpler to > use a StAX parser (Woodstox) because the whole results-consuming > process is then determined > by the application. It is as easy to write StAX code as to write SAX > code. StAX seems like a good choice. > For handling SELECT queries, we don't need a full API. We need to be > able to handle RDF > terms, but not triples. If we have a factory that can handle RDF terms, adding support for triples is trivial. > For CONSTRUCT, etc, it might be better to properly link the result to > the local RDf toolkit > of choice (e.g. via an InputStream). c.f. SQL Blobs/Clobs. I would find it more convenient to get the triples of the graph returned as triples (i.e. triple per row) using the factory along with speudo column accessors. This way we would (first of all) avoid special content-type handling. When using the InputStream-approach the user should be in control of the requested content-type. However, since InputStreams are more convenient in some situations (e.g. when using XSLT to process the results) maybe the best alternative would be to provice both... and since at least some of the use cases for InputStream access to the results are the same regardless of the type of the query, I see no reason to limit the usage of InputStream results only to construct and describe. To support these use cases we could overload Statement's execute with one that returns an InputStream. E.g. Connection c = datasource.getConnection(); Statement select = c.createStatement(); InputStream results = select.executeRaw("<any sparql query>"); The preferred RDF serialization could be provided to the statement via select.setRdfLang("N3") similarily to other hints (e.g. fetch size, escape processing...). Br, Samppa -- Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com |
From: Samppa S. <sam...@pr...> - 2005-12-29 07:47:20
|
In addition to getObject returning RDFNodes via a configurable factory we could provide pseudo columns to access properties of nodes (type, datatype, lang) e.g. ResultsSet rs = executeQuery("select * where {?s ?p ?o}"); while (rs.next()) { String label = rs.getString("o"); int type = rs.getInt("o$type"); switch (type) { case RESOURCE: ... case BLANK_NODE: ... case PLAIN_LITERAL: String lang = rs.getString("o$lang"); ... case TYPED_LITERAL: String datatype = rs.getString("o$datatype"); ... } } Should XMLLiteral be handled as a separate type or as a typed literal with datatype rdf:XMLLiteral? Br, Samppa > Hello. > > Concerning SELECT results access it would be possible to plug RDF apis > into the driver via a factory interface. > > We could provide a neutral SPARQL4J dependent minimal RDF API and in > addition to this the possibility to plug external APIs into the driver. > > RDF nodes (or any other objects returned by the factory) could be > accessed via the umtyped accessors of the JDBC result set. > > e.g. > > interface NodeFactory{ > > public Object createLiteral(String lex, String lang, String datatype); > > public Object createResource(String uri); > > public Object createResource(String ns, String local); > > public Object createBlankNode(String internalID); > > } > > The name of the implementation class could be given to the Driver via > the property map. > > Br, > Timo. > -- Samppa Saarela <samppa.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)41 515 1412 Internet: http://www.profium.com |
From: Timo W. <tim...@pr...> - 2005-12-22 15:00:29
|
Hello. Concerning SELECT results access it would be possible to plug RDF apis into the driver via a factory interface. We could provide a neutral SPARQL4J dependent minimal RDF API and in addition to this the possibility to plug external APIs into the driver. RDF nodes (or any other objects returned by the factory) could be accessed via the umtyped accessors of the JDBC result set. e.g. interface NodeFactory{ public Object createLiteral(String lex, String lang, String datatype); public Object createResource(String uri); public Object createResource(String ns, String local); public Object createBlankNode(String internalID); } The name of the implementation class could be given to the Driver via the property map. Br, Timo. -- Timo Westkämper <timo.westkamper at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)40 591 2172 Internet: http://www.profium.com |
From: Seaborne, A. <and...@hp...> - 2005-12-21 16:34:12
|
-------- Original Message -------- > From: Janne Saarela <> > Date: 20 December 2005 15:38 >=20 > Just to let you know: I've committed some skeleton files to cvs. I've > mostly developed Driver.java and DriverTest.java which include some > property management functionality and pass simple unit tests. =20 >=20 Aside: Just checked these out - it would be better to use Commons Logging rather than hardwire log4j because sparq4j is a library that has to co-exist with other subsystems. Using org.apache.logging is just as easy. It uses Log4J if it can find an implementation on the classpath else uses Java logging else uses a simple built in logger. > The Connection.java class needs XML parsing to be implemented with > ResultSet.java implementation to hold and serve query results - > volunteers? =20 >=20 > How do you feel like testing the driver against SPARQL implementations? > Perhaps the server URI should be made configurable via some > test.properties file which is picked up by build.xml? Volunteers?=20 Will do. What data and implementations are we going to test against? It would be useful to be able for us all to be able to recreate the same tests. I have a small server at http://sparql.org/sparql which is backed by Joseki and ARQ. See http://sparql.org/query.html Just don't overload it by loading large graphs (it should give an error if the graph is over 10K triples). What's the state of Profium's engine? I'd like to try my client library in parallel to compare SPARQl4j with it. It's in the ARQ codebase - and the command line arq.sparql will access remote services with the --service argument (over HTTP). Andy >=20 > I plan to throw exceptions either by messages 'not supported' and 'TODO: > not implemented' in all of the methods required by java.sql interfaces. > They will eventually disappear as our work progresses. Good way forward. >=20 > Regards, > Janne > -- > Janne Saarela <janne.saarela at profium.com> Profium, Lars Sonckin > kaari 12, 02600 Espoo, Finland=20 > Internet: http://www.profium.com >=20 >=20 >=20 > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files=20 > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > Sparql4j-devel mailing list > Spa...@li... > https://lists.sourceforge.net/lists/listinfo/sparql4j-devel |
From: Seaborne, A. <and...@hp...> - 2005-12-21 16:23:33
|
-------- Original Message -------- > From: Timo Westkamper <> > Date: 21 December 2005 10:13 >=20 > Hello. >=20 > I have added basic SAX based result parsing (SELECT results) into CVS. There is a mismatch between the JDBC paradigm and the SAX paradigm. = JDBC is purely driven by the client application and there are no = mandator call backs. SAX is driven by the rate of arrival as given by = the parser so it migh have to accumulate results until the client is = ready. There could be a bounded pipe between application and SAX code but I = found it simpler to use a StAX parser (Woodstox) because the whole = results-consuming process is then determined by the application. It is = as easy to write StAX code as to write SAX code. ARQ does this - the code is in ARQ CVS: :pserver:ano...@cv...:/cvsroot/jena/ Module ARQ Package com.hp.hpl.jena.query.resultset >=20 > What is your opinion on including a lightweight RDF API for passing = RDF > nodes around?=20 For handling SELECT queries, we don't need a full API. We need to be = able to handle RDF terms, but not triples. For CONSTRUCT, etc, it might be better to properly link the result to = the local RDf toolkit of choice (e.g. via an InputStream). c.f. SQL = Blobs/Clobs. Andy >=20 > Br, > Timo Westk=E4mper. |
From: Timo W. <tim...@pr...> - 2005-12-21 10:13:21
|
Hello. I have added basic SAX based result parsing (SELECT results) into CVS. What is your opinion on including a lightweight RDF API for passing RDF nodes around? Br, Timo Westkämper. -- Timo Westkämper <timo.westkamper at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Tel. +358 (0)9 855 98 000 Fax. +358 (0)9 855 98 002 Mob. +358 (0)40 591 2172 Internet: http://www.profium.com |
From: Janne S. <jan...@pr...> - 2005-12-20 15:38:28
|
Just to let you know: I've committed some skeleton files to cvs. I've mostly developed Driver.java and DriverTest.java which include some property management functionality and pass simple unit tests. The Connection.java class needs XML parsing to be implemented with ResultSet.java implementation to hold and serve query results - volunteers? How do you feel like testing the driver against SPARQL implementations? Perhaps the server URI should be made configurable via some test.properties file which is picked up by build.xml? Volunteers? I plan to throw exceptions either by messages 'not supported' and 'TODO: not implemented' in all of the methods required by java.sql interfaces. They will eventually disappear as our work progresses. Regards, Janne -- Janne Saarela <janne.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Internet: http://www.profium.com |
From: Seaborne, A. <and...@hp...> - 2005-11-16 18:05:58
|
-------- Original Message -------- > From: Janne Saarela <> > Date: 15 November 2005 12:28 >=20 > Hi all >=20 > Can I have some of your cpu time for thinking how we should manage > connection objects in sparql4j project?=20 >=20 > As sparql currently does not support state management (cursors or > anything of the sort), there is no need to manage state in the > Connection object internally. =20 >=20 > How the connections are passed between Driver, Connection and Statement > (or PreparedStatement) classes is something we should design right so > that eventual state management would be easier to add in the future. =20 >=20 > I am thinking that the Driver.getConnection() method could return > nothing but a Connection object which internally manages HTTP > connections to SPARQL servers. =20 >=20 > Once a (Prepared)Statement.execute() method (or variants of it) is > executed, the actual protocol call is created within the Connection > object. This would centralize HTTP code management to the Connection > class leaving room for internal connection pool management if HTTP > connections were e.g. persisted and left open for future calls. Good idea - we *might* be able to get this for free by having swappable HTP clients. Maybe httpClient does the necessary work for use (although this should not be a system prerequiste). =20 Andy >=20 > I don't see a reason to pool Connection objects in the Driver level at > this time.=20 >=20 > Janne > -- > Janne Saarela <janne.saarela at profium.com> Profium, Lars Sonckin > kaari 12, 02600 Espoo, Finland=20 > Internet: http://www.profium.com >=20 >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by the JBoss Inc. Get Certified Today > Register for a JBoss Training Course. Free Certification Exam > for All Training Attendees Through End of 2005. For more info visit: > http://ads.osdn.com/?ad_id=3D7628&alloc_id=3D16845&op=3Dclick > _______________________________________________ > Sparql4j-devel mailing list > Spa...@li... > https://lists.sourceforge.net/lists/listinfo/sparql4j-devel |
From: Seaborne, A. <and...@hp...> - 2005-11-15 14:27:02
|
-------- Original Message -------- > From: Janne Saarela <mailto:jan...@pr...> > Date: 11 November 2005 11:19 >=20 > > Your remark on the timezone was a good one - what I don't know (yet) > > is whether the javax.xml.datatypes could be returned from > > getTimestamp() methods (i.e. do these types inherit any of the java > > basic data types) or if we need a method signature? >=20 > The last question should have included word 'new' i.e. 'do we need a new > method signature"? >=20 > Janne A new method signature would mean that the app needs to use a new interface to get the functionality. Doable but may be not what we want. How about returning java.sql.Timestamp (extends java.util.Date, not Calendar) as usual with the mapping of XSD datatype to the string form that DateFormat accepts. This is OK for query - but for update, things might not round trip in lexical form (the time is right but the way it is written may change). Andy |
From: Seaborne, A. <and...@hp...> - 2005-11-15 14:16:36
|
-------- Original Message -------- > From: Janne Saarela <mailto:jan...@pr...> > Date: 11 November 2005 11:13 >=20 > > You're right - email would be best for discussion of the document. >=20 > Let me switch to email from the word doc for a while. >=20 > 1. I would now agree that let's not override the existing semantics of > the execute() method i.e. let's manage boolean return values via > ResultSet with 1 single row. Aside: that seems to be the approach of MS SQLServer for XML results - a one row,one column result set with the XML in it. > 2. In order to make work proceed step-by-step, should we define the > development via milestones where M1 would include boolean results as > well as result sets but not graphs. Fine - a quite minimal M0 to get the whole project process sorted out may be a good idea. =20 > M2, once planned later, would then > include graphs, too. We have some use cases for their eventual inclusion > but in order to validate sparql protocol practical aspects we could > target M1 first. >=20 > Your remark on the timezone was a good one - what I don't know (yet) is > whether the javax.xml.datatypes could be returned from getTimestamp() > methods (i.e. do these types inherit any of the java basic data types) > or if we need a method signature? Next message ... >=20 > Janne Andy |
From: Janne S. <jan...@pr...> - 2005-11-15 12:28:12
|
Hi all Can I have some of your cpu time for thinking how we should manage connection objects in sparql4j project? As sparql currently does not support state management (cursors or anything of the sort), there is no need to manage state in the Connection object internally. How the connections are passed between Driver, Connection and Statement (or PreparedStatement) classes is something we should design right so that eventual state management would be easier to add in the future. I am thinking that the Driver.getConnection() method could return nothing but a Connection object which internally manages HTTP connections to SPARQL servers. Once a (Prepared)Statement.execute() method (or variants of it) is executed, the actual protocol call is created within the Connection object. This would centralize HTTP code management to the Connection class leaving room for internal connection pool management if HTTP connections were e.g. persisted and left open for future calls. I don't see a reason to pool Connection objects in the Driver level at this time. Janne -- Janne Saarela <janne.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Internet: http://www.profium.com |
From: Janne S. <jan...@pr...> - 2005-11-11 11:19:45
|
> Your remark on the timezone was a good one - what I don't know (yet) is > whether the javax.xml.datatypes could be returned from getTimestamp() > methods (i.e. do these types inherit any of the java basic data types) > or if we need a method signature? The last question should have included word 'new' i.e. 'do we need a new method signature"? Janne -- Janne Saarela <janne.saarela at profium.com> Profium, Lars Sonckin kaari 12, 02600 Espoo, Finland Internet: http://www.profium.com |