Re: [dotNetRDF-bugs] Problems with SPARQL queries

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

No I didn't look at that at all, the other bugs with GRAPH and sub-query
execution meant the results were incorrect anyway so I didn't attempt to
look into what effect the ORDER BY has as well since ORDER BY wasn't
necessary to reproduce the poor performance

I cut a 1.0.5 release on Friday so if you still experience issues with
ORDER BY please file a new bug that describes that issue

Note that 1.0.5 will cause results for some queries to change because of
the fixes to execution of GRAPH clauses and sub-queries.

Rob

On 23/05/2014 13:49, "Tomek Pluskiewicz" <to...@pl...> wrote:

>Thanks. I'm always equally impressed with the speed and efficiency!
>
>Any idea though why the ORDER BY is required for the query to return
>correct results reliably?
>
>We're good with 1.0.3 for now so you need not rush.
>
>Cheers,
>Tom
>
>On May 22, 2014 5:53 PM, "Rob Vesse" <rv...@do...> wrote:
>>
>> Ah, I think I see what the problem is (well there's two in fact)
>>
>> One is that the sub-query is getting scheduled too early in the query
>>which I have fixed
>>
>> The other I have just found was likely introduced by a commit that went
>>into 1.0.4 hence why I was asking if this was a regression from 1.0.3.
>>It relates to algebra generation and means we're potentially executing
>>the graph clause too many times.  This is probably gonna be a little
>>tricker to fix but I will aim to have it fixed for 1.0.5 and try and get
>>you a pre-release build with a fix as soon as I can
>>
>> Rob
>>
>> From: Tomek Pluskiewicz <to...@pl...>
>> Reply-To: dotNetRDF Bug Report tracking and resolution
>><dot...@li...>
>> Date: Thursday, 22 May 2014 16:18
>> To: dotNetRDF Bug Report tracking and resolution
>><dot...@li...>
>> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries
>>
>> I tried 1.0.4 and 1.0.5-pre2 and both are equally slow.
>>
>> Tom
>>
>> On May 22, 2014 4:40 PM, "Rob Vesse" <rv...@do...> wrote:
>>>
>>> Tom
>>>
>>> Are you saying that performance is substantially worse with 1.0.4
>>>versus 1.0.3 or the performance is just as bad across all recent
>>>releases?
>>>
>>> Rob
>>>
>>> From: Tomasz Pluskiewicz <tom...@gm...>
>>> Reply-To: dotNetRDF Bug Report tracking and resolution
>>><dot...@li...>
>>> Date: Thursday, 22 May 2014 14:48
>>> To: dotNetRDF Bug Report tracking and resolution
>>><dot...@li...>
>>> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries
>>>
>>> Rob, thanks for responding.
>>>
>>> Always +1 for additional diagnostic tools (I mean the ExplainProcessor
>>>enhancement).
>>>
>>> I've been fiddling with our query and the ?s ?p ?o pattern seems to
>>>have little but noticeable impact on the synthetic dataset. But indeed
>>>moving the subquery as-is outside the first GRAPH ?var boosts the query
>>>by an order of magnitude. I've also tried to remove the duplicate
>>>triple patterns on both GRAPH ?v patterns but it doesn't help much
>>>either. Interestingly a query which combines subquery moved, ?s ?p ?o
>>>extracted and duplicate triple patters removed is significantly slower
>>>then the one with just subquery moved outside the GRAPH ?var.
>>>
>>> I've ran all kinds of queries against our real-life data (20k quads in
>>>over 900 graphs) and the conclusions are the same. Moving subquery and
>>>?s ?p ?o graph pattern gives best results.
>>>
>>> Regarding the ORDER BY it still seems like a bug. I wanted to blame
>>>inconsistent results on the fact that the subquery is nested inside the
>>>GRAPH ?var but with the subquery moved I observe the same bahaviour.
>>>
>>> All the above is true for 1.0.3. Now regarding 1.0.4+ there are
>>>additional problems as I wrote yesterday. With the real-life data the
>>>original query takes over 2.5 minutes to complete, while in previous
>>>version only about a quarter of a second is needed! The optimized
>>>queries actually took so long that I never had them finished.
>>>
>>> Tom
>>>
>>>
>>> On Wed, May 21, 2014 at 3:47 PM, Rob Vesse <rv...@do...>
>>>wrote:
>>>>
>>>> Tom
>>>>
>>>> Thanks for the report, I haven't done any debugging yet but I have a
>>>>few thoughts based on what you've described
>>>>
>>>> ORDER BY causing indeterminate results could be a bug but it also
>>>>could just be an artefact of two things:
>>>>
>>>> SPARQL only defines a partial ordering so there are some combinations
>>>>of terms for which ordering is left to the implementation though since
>>>>we're just talking about dotNetRDF such indeterminate orderings should
>>>>be defined consistently
>>>> That you have multiple terms in the data that compare to be
>>>>equivalent, in this case we're at the mercy of .Net's sort
>>>>implementation for which items float to the top and so are returned
>>>>each time
>>>>
>>>> GRAPH ?var can be quite expensive because what it does is evaluate
>>>>the inner operations over each individual named graph in the dataset
>>>>in turn.  Where ?var is already bound this might be a small subset but
>>>>given the structure of your query I suspect there are at least some
>>>>places where this is happening.  So with two points in your query
>>>>where you have GRAPH ?var being potentially unbound (or bound to a
>>>>large number of possible values) you would get the O(n2) exponential
>>>>scaling behaviour you describe
>>>>
>>>> Also the ?s ?p ?o in the start of your first GRAPH clause may be
>>>>causing a substantial increase in intermediate results early on in the
>>>>query.  It might be better to have a separate GRAPH clause after the
>>>>first GRAPH clause to pull out all the triples once you've determined
>>>>the graphs you actually care about.
>>>>
>>>> There is of course a possibility that dotNetRDF is optimising the
>>>>query badly but that will require some debugging to figure out if this
>>>>is the case.
>>>>
>>>> Using the ExplainQueryProcessor
>>>>(http://www.dotnetrdf.org/api/index.asp?Topic=VDS.RDF.Query.ExplainQuer
>>>>yProcessor) with the ExplanationLevel turned up to Full as described
>>>>at 
>>>>https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/HowTo/Debug%20SPARQL%20Q
>>>>ueries.wiki#!debugging-sparql-queries might be enlightening since
>>>>it'll include things like intermediate result count.  Though it
>>>>doesn't currently analyse how many graphs a given GRAPH clause has to
>>>>consider which it'll make it hard to spot that exponential looping on
>>>>GRAPH ?var if that is the culprit, that would certainly be interesting
>>>>information so I may try and add that in the future.
>>>>
>>>> Let me know if you guys figure anything more out, I'll aim to take a
>>>>proper look and debug this later in the week
>>>>
>>>> Cheers,
>>>>
>>>> Rob
>>>>
>>>> From: Tomek Pluskiewicz <to...@pl...>
>>>> Reply-To: dotNetRDF Bug Report tracking and resolution
>>>><dot...@li...>
>>>> Date: Wednesday, 21 May 2014 13:46
>>>> To: dotNetRDF Bug Report tracking and resolution
>>>><dot...@li...>
>>>> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries
>>>>
>>>> Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test
>>>>
>>>>
>>>> On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz
>>>><to...@pl...> wrote:
>>>>>
>>>>> Hi Rob
>>>>>
>>>>> We've developing a ORM solution complete with Linq for some time
>>>>>now. Will be open source'd at some point. Currently we've been
>>>>>experiencing problems with query speed and reliability. Let me
>>>>>acquaint you with how things work.
>>>>>
>>>>> Each resource is contained within its own named graph and
>>>>>additionally there is a meta-graph, which connects graphs and the
>>>>>described entities (there could be many graphs for one resource). For
>>>>>example
>>>>>
>>>>> # meta graph
>>>>> <http://foo.com/productList/>
>>>>> {
>>>>>   ex:Wrench1 foaf:primaryTopic ex:Wrench1 .
>>>>> }
>>>>>
>>>>> # wrench
>>>>> ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . }
>>>>>
>>>>> The problem is with a query
>>>>>
>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>> PREFIX schema: <http://schema.org/>
>>>>> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>>>>>
>>>>> SELECT ?s ?p ?o ?Gp0 ?p0
>>>>> WHERE
>>>>> {
>>>>> GRAPH ?Gp0
>>>>> {
>>>>> ?s ?p ?o .
>>>>> ?p0_sub schema:name ?name0_sub .
>>>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string))
>>>>> ?p0 rdf:type schema:Product .
>>>>> {
>>>>> SELECT DISTINCT ?p0_sub
>>>>> WHERE
>>>>> {
>>>>> GRAPH ?Gp0_sub
>>>>> {
>>>>> ?p0_sub rdf:type schema:Product .
>>>>> ?p0_sub schema:name ?name0_sub .
>>>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string))
>>>>> }
>>>>> GRAPH <http://foo.com/productList/>
>>>>> {
>>>>> ?Gp0_sub foaf:primaryTopic ?p0_sub .
>>>>> }
>>>>> }
>>>>> #ORDER BY ?p0_sub
>>>>> LIMIT 2
>>>>> }
>>>>> FILTER(?p0_sub=?p0)
>>>>> }
>>>>>
>>>>> GRAPH <http://foo.com/productList/>
>>>>> {
>>>>> ?Gp0 foaf:primaryTopic ?p0 .
>>>>> }
>>>>> }
>>>>>
>>>>> transformed from the following Linq
>>>>>
>>>>> Query<IProduct>().Where(p =>
>>>>>p.Name.ToUpper().Contains(name.ToUpper())).Take(2)
>>>>>
>>>>> There are two problems here. The query returns different results on
>>>>>subsequent runs against the same dataset and it runs very slow.
>>>>>Uncommenting the ORDER BY helps with the varying result count though
>>>>>I'm not exactly sure why it should be necessary. However I'm not sure
>>>>>what's with performance. Obviously it has something to do with the
>>>>>subquery but I was unable to alter this SELECT so that it executed
>>>>>quickly. Even as small a dataset as 9 quads (3 resources * (2 triples
>>>>>+ 1 meta-triple)) takes 1 second to complete and the time seems to
>>>>>increase exponentially. At 90 quads/30 graphs it is already taking
>>>>>close to 3 minutes.
>>>>>
>>>>> We've first observed the performance problems with version 1.0.4 but
>>>>>with a synthetic dataset the same issues arise in previous releases
>>>>>and 1.0.5+.
>>>>>
>>>>> Hope you can help. Would you like any additional info?
>>>>>
>>>>> Regards,
>>>>> Tom
>>>>
>>>>
>>>> 
>>>>-----------------------------------------------------------------------
>>>>------- "Accelerate Dev Cycles with Automated Cross-Browser Testing -
>>>>For FREE Instantly run your Selenium tests across 300+ browser/OS
>>>>combos. Get unparalleled scalability from the best Selenium testing
>>>>platform available Simple to use. Nothing to install. Get started now
>>>>for free." 
>>>>http://p.sf.net/sfu/SauceLabs__________________________________________
>>>>_____ dotNetRDF-bugs mailing list
>>>>dot...@li...://lists.sourceforge.net/lists
>>>>/listinfo/dotnetrdf-bugs
>>>>
>>>>
>>>> 
>>>>-----------------------------------------------------------------------
>>>>-------
>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>>> Instantly run your Selenium tests across 300+ browser/OS combos.
>>>> Get unparalleled scalability from the best Selenium testing platform
>>>>available
>>>> Simple to use. Nothing to install. Get started now for free."
>>>> http://p.sf.net/sfu/SauceLabs
>>>> _______________________________________________
>>>> dotNetRDF-bugs mailing list
>>>> dot...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
>>>>
>>>
>>> 
>>>------------------------------------------------------------------------
>>>------ "Accelerate Dev Cycles with Automated Cross-Browser Testing -
>>>For FREE Instantly run your Selenium tests across 300+ browser/OS
>>>combos. Get unparalleled scalability from the best Selenium testing
>>>platform available Simple to use. Nothing to install. Get started now
>>>for free." 
>>>http://p.sf.net/sfu/SauceLabs___________________________________________
>>>____ dotNetRDF-bugs mailing list
>>>dot...@li...://lists.sourceforge.net/lists/
>>>listinfo/dotnetrdf-bugs
>>>
>>>
>>> 
>>>------------------------------------------------------------------------
>>>------
>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>> Instantly run your Selenium tests across 300+ browser/OS combos.
>>> Get unparalleled scalability from the best Selenium testing platform
>>>available
>>> Simple to use. Nothing to install. Get started now for free."
>>> http://p.sf.net/sfu/SauceLabs
>>> _______________________________________________
>>> dotNetRDF-bugs mailing list
>>> dot...@li...
>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
>>>
>> 
>>-------------------------------------------------------------------------
>>----- "Accelerate Dev Cycles with Automated Cross-Browser Testing - For
>>FREE Instantly run your Selenium tests across 300+ browser/OS combos.
>>Get unparalleled scalability from the best Selenium testing platform
>>available Simple to use. Nothing to install. Get started now for free."
>>http://p.sf.net/sfu/SauceLabs____________________________________________
>>___ dotNetRDF-bugs mailing list dot...@li...
>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
>>
>>
>> 
>>-------------------------------------------------------------------------
>>-----
>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>> Instantly run your Selenium tests across 300+ browser/OS combos.
>> Get unparalleled scalability from the best Selenium testing platform
>>available
>> Simple to use. Nothing to install. Get started now for free."
>> http://p.sf.net/sfu/SauceLabs
>> _______________________________________________
>> dotNetRDF-bugs mailing list
>> dot...@li...
>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
>>
>
>--------------------------------------------------------------------------
>----
>"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>Instantly run your Selenium tests across 300+ browser/OS combos.
>Get unparalleled scalability from the best Selenium testing platform
>available
>Simple to use. Nothing to install. Get started now for free."
>http://p.sf.net/sfu/SauceLabs
>_______________________________________________
>dotNetRDF-bugs mailing list
>dot...@li...
>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs