Re: [dotNetRDF-bugs] Problems with SPARQL queries

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I tried 1.0.4 and 1.0.5-pre2 and both are equally slow.

Tom
On May 22, 2014 4:40 PM, "Rob Vesse" <rv...@do...> wrote:

> Tom
>
> Are you saying that performance is substantially worse with 1.0.4 versus
> 1.0.3 or the performance is just as bad across all recent releases?
>
> Rob
>
> From: Tomasz Pluskiewicz <tom...@gm...>
> Reply-To: dotNetRDF Bug Report tracking and resolution <
> dot...@li...>
> Date: Thursday, 22 May 2014 14:48
> To: dotNetRDF Bug Report tracking and resolution <
> dot...@li...>
> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries
>
> Rob, thanks for responding.
>
> Always +1 for additional diagnostic tools (I mean the ExplainProcessor
> enhancement).
>
> I've been fiddling with our query and the ?s ?p ?o pattern seems to have
> little but noticeable impact on the synthetic dataset. But indeed moving
> the subquery as-is outside the first GRAPH ?var boosts the query by an
> order of magnitude. I've also tried to remove the duplicate triple patterns
> on both GRAPH ?v patterns but it doesn't help much either. Interestingly a
> query which combines subquery moved, ?s ?p ?o extracted and duplicate
> triple patters removed is significantly slower then the one with just
> subquery moved outside the GRAPH ?var.
>
> I've ran all kinds of queries against our real-life data (20k quads in
> over 900 graphs) and the conclusions are the same. Moving subquery and ?s
> ?p ?o graph pattern gives best results.
>
> Regarding the ORDER BY it still seems like a bug. I wanted to blame
> inconsistent results on the fact that the subquery is nested inside the
> GRAPH ?var but with the subquery moved I observe the same bahaviour.
>
> All the above is true for 1.0.3. Now regarding 1.0.4+ there are additional
> problems as I wrote yesterday. With the real-life data the original query
> takes over 2.5 minutes to complete, while in previous version only about a
> quarter of a second is needed! The optimized queries actually took so long
> that I never had them finished.
>
> Tom
>
>
> On Wed, May 21, 2014 at 3:47 PM, Rob Vesse <rv...@do...> wrote:
>
>> Tom
>>
>> Thanks for the report, I haven't done any debugging yet but I have a few
>> thoughts based on what you've described
>>
>> ORDER BY causing indeterminate results could be a bug but it also could
>> just be an artefact of two things:
>>
>>    1. SPARQL only defines a partial ordering so there are some
>>    combinations of terms for which ordering is left to the implementation
>>    though since we're just talking about dotNetRDF such indeterminate
>>    orderings should be defined consistently
>>    2. That you have multiple terms in the data that compare to be
>>    equivalent, in this case we're at the mercy of .Net's sort implementation
>>    for which items float to the top and so are returned each time
>>
>> GRAPH ?var can be quite expensive because what it does is evaluate the
>> inner operations over each individual named graph in the dataset in turn.
>>  Where ?var is already bound this might be a small subset but given the
>> structure of your query I suspect there are at least some places where this
>> is happening.  So with two points in your query where you have GRAPH ?var
>> being potentially unbound (or bound to a large number of possible values)
>> you would get the O(n2) exponential scaling behaviour you describe
>>
>> Also the ?s ?p ?o in the start of your first GRAPH clause may be causing
>> a substantial increase in intermediate results early on in the query.  It
>> might be better to have a separate GRAPH clause after the first GRAPH
>> clause to pull out all the triples once you've determined the graphs you
>> actually care about.
>>
>> There is of course a possibility that dotNetRDF is optimising the query
>> badly but that will require some debugging to figure out if this is the
>> case.
>>
>> Using the ExplainQueryProcessor (
>> http://www.dotnetrdf.org/api/index.asp?Topic=VDS.RDF.Query.ExplainQueryProcessor)
>> with the ExplanationLevel turned up to Full as described at
>> https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/HowTo/Debug%20SPARQL%20Queries.wiki#!debugging-sparql-queries might
>> be enlightening since it'll include things like intermediate result count.
>>  Though it doesn't currently analyse how many graphs a given GRAPH clause
>> has to consider which it'll make it hard to spot that exponential looping
>> on GRAPH ?var if that is the culprit, that would certainly be interesting
>> information so I may try and add that in the future.
>>
>> Let me know if you guys figure anything more out, I'll aim to take a
>> proper look and debug this later in the week
>>
>> Cheers,
>>
>> Rob
>>
>> From: Tomek Pluskiewicz <to...@pl...>
>> Reply-To: dotNetRDF Bug Report tracking and resolution <
>> dot...@li...>
>> Date: Wednesday, 21 May 2014 13:46
>> To: dotNetRDF Bug Report tracking and resolution <
>> dot...@li...>
>> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries
>>
>> Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test
>>
>>
>> On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz <to...@pl...
>> > wrote:
>>
>>> Hi Rob
>>>
>>> We've developing a ORM solution complete with Linq for some time now.
>>> Will be open source'd at some point. Currently we've been experiencing
>>> problems with query speed and reliability. Let me acquaint you with how
>>> things work.
>>>
>>> Each resource is contained within its own named graph and additionally
>>> there is a meta-graph, which connects graphs and the described entities
>>> (there could be many graphs for one resource). For example
>>>
>>> # meta graph
>>> <http://foo.com/productList/>
>>> {
>>>   ex:Wrench1 foaf:primaryTopic ex:Wrench1 .
>>> }
>>>
>>> # wrench
>>> ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . }
>>>
>>> The problem is with a query
>>>
>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>> PREFIX schema: <http://schema.org/>
>>> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>>>
>>> SELECT ?s ?p ?o ?Gp0 ?p0
>>> WHERE
>>> {
>>> GRAPH ?Gp0
>>> {
>>> ?s ?p ?o .
>>> ?p0_sub schema:name ?name0_sub .
>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string))
>>> ?p0 rdf:type schema:Product .
>>> {
>>> SELECT DISTINCT ?p0_sub
>>> WHERE
>>> {
>>> GRAPH ?Gp0_sub
>>> {
>>> ?p0_sub rdf:type schema:Product .
>>> ?p0_sub schema:name ?name0_sub .
>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string))
>>> }
>>> GRAPH <http://foo.com/productList/>
>>> {
>>> ?Gp0_sub foaf:primaryTopic ?p0_sub .
>>> }
>>> }
>>> #ORDER BY ?p0_sub
>>> LIMIT 2
>>> }
>>> FILTER(?p0_sub=?p0)
>>> }
>>>
>>> GRAPH <http://foo.com/productList/>
>>> {
>>> ?Gp0 foaf:primaryTopic ?p0 .
>>> }
>>> }
>>>
>>> transformed from the following Linq
>>>
>>> Query<IProduct>().Where(p =>
>>> p.Name.ToUpper().Contains(name.ToUpper())).Take(2)
>>>
>>> There are two problems here. The query returns different results on
>>> subsequent runs against the same dataset and it runs very slow.
>>> Uncommenting the ORDER BY helps with the varying result count though
>>> I'm not exactly sure why it should be necessary. However I'm not sure
>>> what's with performance. Obviously it has something to do with the subquery
>>> but I was unable to alter this SELECT so that it executed quickly. Even
>>> as small a dataset as 9 quads (3 resources * (2 triples + 1 meta-triple))
>>> takes 1 second to complete and the time seems to increase exponentially. At
>>> 90 quads/30 graphs it is already taking close to 3 minutes.
>>>
>>> We've first observed the performance problems with version 1.0.4 but
>>> with a synthetic dataset the same issues arise in previous releases and
>>> 1.0.5+.
>>>
>>> Hope you can help. Would you like any additional info?
>>>
>>> Regards,
>>> Tom
>>>
>>
>> ------------------------------------------------------------------------------
>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>> Instantly run your Selenium tests across 300+ browser/OS combos. Get
>> unparalleled scalability from the best Selenium testing platform available
>> Simple to use. Nothing to install. Get started now for free."
>> http://p.sf.net/sfu/SauceLabs_______________________________________________dotNetRDF-bugs mailing list
>> dot...@li...
>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
>>
>>
>>
>> ------------------------------------------------------------------------------
>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>> Instantly run your Selenium tests across 300+ browser/OS combos.
>> Get unparalleled scalability from the best Selenium testing platform
>> available
>> Simple to use. Nothing to install. Get started now for free."
>> http://p.sf.net/sfu/SauceLabs
>> _______________________________________________
>> dotNetRDF-bugs mailing list
>> dot...@li...
>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
>>
>>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos. Get
> unparalleled scalability from the best Selenium testing platform available
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs_______________________________________________dotNetRDF-bugs mailing list
> dot...@li...
> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
>
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.
> Get unparalleled scalability from the best Selenium testing platform
> available
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> dotNetRDF-bugs mailing list
> dot...@li...
> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
>
>