From: Christian W. <cw...@cw...> - 2007-01-23 12:44:29
Attachments:
signature.asc
|
Hello all, I face a very big problem regarding my SparqlEngineDb implementation: Sorting. One can sort by any variable of any type in Sparql. The data type of the values is stored in statements/l_datatype column and can hardly be directly used in a SQL query. Imagine the following query: ---------------------- SELECT ?name ?emp WHERE { ?x foaf:name ?name ; ex:empId ?emp } ORDER BY ASC(?emp) ---------------------- Now ?emp is an object following an "ex:empId" predicate, making the datatype of ?emp xsd:integer. Making the SQL query in this case is easy; just "ORDER BY CAST(emp as INTEGER) ASC" and it's done. Main problem is that I don't know in advance which data type the column will have (as there could be statements like "?a ?b ?c .", ordering by ?c) and that, even more serious, there could be multiple data types in this data. I don't know what to do here. The only ideas I came up with are: 1) Use a stored procedure which does comparison. This will render the engine useless for older db systems such as mysql < 5. I also need to get into stored procedures, but I think it is possible to hook into sorti= ng. 2) Sort on the client side. I would like to omit this since it would decrease performance greatly. Further, I couldn't use server-side LIMITs, making performance even worse. 3) Provide data type hints in the query, which would make the sparql implementation understand proprietary queries and still wouldn't work with normal queries. Anybody an idea what I could do? Btw, RDQL doesn't even provide ORDER support. I think I know why.. --=20 Regards/Mit freundlichen Gr=C3=BC=C3=9Fen Christian Weiske |
From: Richard C. <ri...@cy...> - 2007-01-23 13:53:23
|
Christian, On 22 Jan 2007, at 19:23, Christian Weiske wrote: > I face a very big problem regarding my SparqlEngineDb implementation: > Sorting. > > One can sort by any variable of any type in Sparql. The data type =20 > of the > values is stored in statements/l_datatype column and can hardly be > directly used in a SQL query. > > Imagine the following query: > ---------------------- > SELECT ?name ?emp > WHERE { ?x foaf:name ?name ; > ex:empId ?emp > } > ORDER BY ASC(?emp) > ---------------------- > > Now ?emp is an object following an "ex:empId" predicate, making the > datatype of ?emp xsd:integer. Making the SQL query in this case is =20 > easy; > just "ORDER BY CAST(emp as INTEGER) ASC" and it's done. > > Main problem is that I don't know in advance which data type the =20 > column > will have (as there could be statements like "?a ?b ?c .", ordering by > ?c) and that, even more serious, there could be multiple data types in > this data. > > I don't know what to do here. The only ideas I came up with are: > 1) Use a stored procedure which does comparison. This will render the > engine useless for older db systems such as mysql < 5. I also need to > get into stored procedures, but I think it is possible to hook into =20= > sorting. > 2) Sort on the client side. I would like to omit this since it would > decrease performance greatly. Further, I couldn't use server-side > LIMITs, making performance even worse. > 3) Provide data type hints in the query, which would make the sparql > implementation understand proprietary queries and still wouldn't work > with normal queries. > > Anybody an idea what I could do? I think you can write something like this in SPARQL: ORDER BY ASC(xsd:int(?emp)) Query authors could use this as a hint that allows the engine to use =20 the right cast in the SQL translation. In the absence of such a hint, =20= I'd just do the sorting client-side. This would be a relatively =20 simple and pragmatic solution. Richard > > > Btw, RDQL doesn't even provide ORDER support. I think I know why.. > > --=20 > Regards/Mit freundlichen Gr=FC=DFen > Christian Weiske > > ----------------------------------------------------------------------=20= > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to =20 > share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?=20 > page=3Djoin.php&p=3Dsourceforge&CID=3DDEVDEV____________________________= ____=20 > _______________ > Rdfapi-php-interest mailing list > Rdf...@li... > https://lists.sourceforge.net/lists/listinfo/rdfapi-php-interest |
From: Christian W. <cw...@cw...> - 2007-01-23 15:00:13
Attachments:
signature.asc
|
Richard, > I think you can write something like this in SPARQL: >=20 > ORDER BY ASC(xsd:int(?emp)) This is true for the Sparql recommendation, but our query parser does not support it :) --=20 Regards/Mit freundlichen Gr=FC=DFen Christian Weiske |
From: Seaborne, A. <and...@hp...> - 2007-01-23 15:26:23
|
-------- Original Message -------- > From: Christian Weiske <> > Date: 22 January 2007 18:23 >=20 > Hello all, >=20 >=20 > I face a very big problem regarding my SparqlEngineDb implementation: > Sorting. >=20 > One can sort by any variable of any type in Sparql. The data type of the > values is stored in statements/l_datatype column and can hardly be > directly used in a SQL query. >=20 > Imagine the following query: > ---------------------- > SELECT ?name ?emp > WHERE { ?x foaf:name ?name ; > ex:empId ?emp > } > ORDER BY ASC(?emp) > ---------------------- >=20 > Now ?emp is an object following an "ex:empId" predicate, making the > datatype of ?emp xsd:integer. Making the SQL query in this case is easy; > just "ORDER BY CAST(emp as INTEGER) ASC" and it's done. >=20 > Main problem is that I don't know in advance which data type the column > will have (as there could be statements like "?a ?b ?c .", ordering by > ?c) and that, even more serious, there could be multiple data types in > this data. >=20 > I don't know what to do here. The only ideas I came up with are: > 1) Use a stored procedure which does comparison. This will render the > engine useless for older db systems such as mysql < 5. I also need to > get into stored procedures, but I think it is possible to hook into > sorting. 2) Sort on the client side. I would like to omit this since it > would=20 > decrease performance greatly. Further, I couldn't use server-side > LIMITs, making performance even worse. > 3) Provide data type hints in the query, which would make the sparql > implementation understand proprietary queries and still wouldn't work > with normal queries. >=20 > Anybody an idea what I could do? The ordering of unlike things is partially defined in SPARQL. I'm not sure what your DB schema is but if the type of the node (URI, bnode, datatype of literal) is some carefully choosen integer then=20 ORDER BY ?emp=20 is (nearly)=20 ORDER BY emp.type, emp.lexicialform. That is, the kind of type is more significant. The numbers need to order the conditions: [[ 1. (Lowest) no value assigned to the variable or expression in this solution. 2. Blank nodes 3. IRIs 4. RDF literals 5. A plain literal is lower than an RDF literal with type xsd:string of the same lexical form. ]] Sorting by lexical form is too weak for value sorting of numbers. This approach might be able to be extended to cover this, And what is the SQL schema by the way? >=20 >=20 > Btw, RDQL doesn't even provide ORDER support. I think I know why.. :-) Andy |
From: Christian W. <cw...@cw...> - 2007-01-23 15:41:11
Attachments:
signature.asc
|
Andy, > The ordering of unlike things is partially defined in SPARQL. I'm not > sure what your DB schema is but if the type of the node (URI, bnode, > datatype of literal) is some carefully choosen integer then=20 >=20 > ORDER BY ?emp=20 > is (nearly)=20 > ORDER BY emp.type, emp.lexicialform. I doubt that I can change the schema since many other parts of rdfapi use it. DB schema: Field Type ----------------------- modelID bigint subject varchar(255) predicate varchar(255) object text l_language varchar(255) l_datatype varchar(255) subject_is varchar(1) object_is varchar(1) subject, predicate and object values are stored in the corresponding fields. subject_is and object_is determine the overall type of the data: literal, resource, blank node. l_datatype is the real data type of the object, e.g. "http://www.w3.org/2001/XMLSchema#integer". By the way, how does jena do it (assuming you are coming from there)? --=20 Regards/Mit freundlichen Gr=FC=DFen Christian Weiske |
From: Seaborne, A. <and...@hp...> - 2007-01-24 15:18:58
|
-------- Original Message -------- > From: Christian Weiske <mailto:cw...@cw...> > Date: 23 January 2007 15:41 >=20 > Andy, >=20 >=20 > > The ordering of unlike things is partially defined in SPARQL. I'm = not > > sure what your DB schema is but if the type of the node (URI, bnode, > > datatype of literal) is some carefully choosen integer then > >=20 > > ORDER BY ?emp > > is (nearly) > > ORDER BY emp.type, emp.lexicialform. >=20 > I doubt that I can change the schema since many other parts of rdfapi > use it.=20 That makes it tricky - isn't there a singel abstraction layer that = handles all RDF->DB mapping? If so, maybe adding a field to help SPARQL is an option. >=20 >=20 > DB schema: >=20 > Field Type > ----------------------- > modelID bigint > subject varchar(255) > predicate varchar(255) > object text > l_language varchar(255) > l_datatype varchar(255) > subject_is varchar(1) > object_is varchar(1) >=20 >=20 > subject, predicate and object values are stored in the corresponding > fields. subject_is and object_is determine the overall type of the > data: literal, resource, blank node.=20 > l_datatype is the real data type of the object, e.g. > "http://www.w3.org/2001/XMLSchema#integer". >=20 >=20 > By the way, how does jena do it (assuming you are coming from there)? Only sort-of coming from there. I was recalling what 3Store did as = well. The new Jena DB layer has a schema (actually, choice of schemas) = specifically for SPARQL so we can store the sort type code. ARQ currently does it all client-side because the objective is to be = correct, corner cases of XSD datatypes and all. Andy >=20 > -- > Regards/Mit freundlichen Gr=FC=DFen > Christian Weiske |
From: Christian W. <cw...@cw...> - 2007-01-24 20:02:19
Attachments:
signature.asc
|
Andy, >> I doubt that I can change the schema since many other parts of rdfapi >> use it.=20 > That makes it tricky - isn't there a singel abstraction layer that hand= les all RDF->DB mapping? There is a working one, but it is really slow when handling thousands or millions of statements since all reasoning is done offline, i.e. on the client side instead of the database. My task is to create a Sparql-to-DB-Enginge that does as much as possible directly on the database server. It seems that sorting is a thing that can only be done on client side. I also came up with another idea: When having a ORDER BY statement, query the database first to get a DISTINCT list of datatypes in the result set. If there is only one type of data, use a ORDER BY CAST(..) in the main query. If there are multiple datatypes, or no suitable cast type exists, sorting has to be done on client side. I don't know how fast it is to make two queries instead of one, but should still be a lot faster when working on millions of rows than sorting client side. --=20 Regards/Mit freundlichen Gr=FC=DFen Christian Weiske |