[dotNetRDF-bugs] Problems with SPARQL queries

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Rob

We've developing a ORM solution complete with Linq for some time now. Will
be open source'd at some point. Currently we've been experiencing problems
with query speed and reliability. Let me acquaint you with how things work.

Each resource is contained within its own named graph and additionally
there is a meta-graph, which connects graphs and the described entities
(there could be many graphs for one resource). For example

# meta graph
<http://foo.com/productList/>
{
  ex:Wrench1 foaf:primaryTopic ex:Wrench1 .
}

# wrench
ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . }

The problem is with a query

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX schema: <http://schema.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?s ?p ?o ?Gp0 ?p0
WHERE
{
 GRAPH ?Gp0
{
?s ?p ?o .
 ?p0_sub schema:name ?name0_sub .
FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string))
 ?p0 rdf:type schema:Product .
{
 SELECT DISTINCT ?p0_sub
WHERE
 {
GRAPH ?Gp0_sub
 {
?p0_sub rdf:type schema:Product .
 ?p0_sub schema:name ?name0_sub .
FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string))
 }
GRAPH <http://foo.com/productList/>
 {
?Gp0_sub foaf:primaryTopic ?p0_sub .
 }
}
#ORDER BY ?p0_sub
 LIMIT 2
}
FILTER(?p0_sub=?p0)
 }

GRAPH <http://foo.com/productList/>
 {
?Gp0 foaf:primaryTopic ?p0 .
 }
}

transformed from the following Linq

Query<IProduct>().Where(p =>
p.Name.ToUpper().Contains(name.ToUpper())).Take(2)

There are two problems here. The query returns different results on
subsequent runs against the same dataset and it runs very slow.
Uncommenting the ORDER BY helps with the varying result count though I'm
not exactly sure why it should be necessary. However I'm not sure what's
with performance. Obviously it has something to do with the subquery but I
was unable to alter this SELECT so that it executed quickly. Even as small
a dataset as 9 quads (3 resources * (2 triples + 1 meta-triple)) takes 1
second to complete and the time seems to increase exponentially. At 90
quads/30 graphs it is already taking close to 3 minutes.

We've first observed the performance problems with version 1.0.4 but with a
synthetic dataset the same issues arise in previous releases and 1.0.5+.

Hope you can help. Would you like any additional info?

Regards,
Tom