From: Rob V. <rv...@do...> - 2014-05-27 08:44:30
|
No I didn't look at that at all, the other bugs with GRAPH and sub-query execution meant the results were incorrect anyway so I didn't attempt to look into what effect the ORDER BY has as well since ORDER BY wasn't necessary to reproduce the poor performance I cut a 1.0.5 release on Friday so if you still experience issues with ORDER BY please file a new bug that describes that issue Note that 1.0.5 will cause results for some queries to change because of the fixes to execution of GRAPH clauses and sub-queries. Rob On 23/05/2014 13:49, "Tomek Pluskiewicz" <to...@pl...> wrote: >Thanks. I'm always equally impressed with the speed and efficiency! > >Any idea though why the ORDER BY is required for the query to return >correct results reliably? > >We're good with 1.0.3 for now so you need not rush. > >Cheers, >Tom > >On May 22, 2014 5:53 PM, "Rob Vesse" <rv...@do...> wrote: >> >> Ah, I think I see what the problem is (well there's two in fact) >> >> One is that the sub-query is getting scheduled too early in the query >>which I have fixed >> >> The other I have just found was likely introduced by a commit that went >>into 1.0.4 hence why I was asking if this was a regression from 1.0.3. >>It relates to algebra generation and means we're potentially executing >>the graph clause too many times. This is probably gonna be a little >>tricker to fix but I will aim to have it fixed for 1.0.5 and try and get >>you a pre-release build with a fix as soon as I can >> >> Rob >> >> From: Tomek Pluskiewicz <to...@pl...> >> Reply-To: dotNetRDF Bug Report tracking and resolution >><dot...@li...> >> Date: Thursday, 22 May 2014 16:18 >> To: dotNetRDF Bug Report tracking and resolution >><dot...@li...> >> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >> >> I tried 1.0.4 and 1.0.5-pre2 and both are equally slow. >> >> Tom >> >> On May 22, 2014 4:40 PM, "Rob Vesse" <rv...@do...> wrote: >>> >>> Tom >>> >>> Are you saying that performance is substantially worse with 1.0.4 >>>versus 1.0.3 or the performance is just as bad across all recent >>>releases? >>> >>> Rob >>> >>> From: Tomasz Pluskiewicz <tom...@gm...> >>> Reply-To: dotNetRDF Bug Report tracking and resolution >>><dot...@li...> >>> Date: Thursday, 22 May 2014 14:48 >>> To: dotNetRDF Bug Report tracking and resolution >>><dot...@li...> >>> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >>> >>> Rob, thanks for responding. >>> >>> Always +1 for additional diagnostic tools (I mean the ExplainProcessor >>>enhancement). >>> >>> I've been fiddling with our query and the ?s ?p ?o pattern seems to >>>have little but noticeable impact on the synthetic dataset. But indeed >>>moving the subquery as-is outside the first GRAPH ?var boosts the query >>>by an order of magnitude. I've also tried to remove the duplicate >>>triple patterns on both GRAPH ?v patterns but it doesn't help much >>>either. Interestingly a query which combines subquery moved, ?s ?p ?o >>>extracted and duplicate triple patters removed is significantly slower >>>then the one with just subquery moved outside the GRAPH ?var. >>> >>> I've ran all kinds of queries against our real-life data (20k quads in >>>over 900 graphs) and the conclusions are the same. Moving subquery and >>>?s ?p ?o graph pattern gives best results. >>> >>> Regarding the ORDER BY it still seems like a bug. I wanted to blame >>>inconsistent results on the fact that the subquery is nested inside the >>>GRAPH ?var but with the subquery moved I observe the same bahaviour. >>> >>> All the above is true for 1.0.3. Now regarding 1.0.4+ there are >>>additional problems as I wrote yesterday. With the real-life data the >>>original query takes over 2.5 minutes to complete, while in previous >>>version only about a quarter of a second is needed! The optimized >>>queries actually took so long that I never had them finished. >>> >>> Tom >>> >>> >>> On Wed, May 21, 2014 at 3:47 PM, Rob Vesse <rv...@do...> >>>wrote: >>>> >>>> Tom >>>> >>>> Thanks for the report, I haven't done any debugging yet but I have a >>>>few thoughts based on what you've described >>>> >>>> ORDER BY causing indeterminate results could be a bug but it also >>>>could just be an artefact of two things: >>>> >>>> SPARQL only defines a partial ordering so there are some combinations >>>>of terms for which ordering is left to the implementation though since >>>>we're just talking about dotNetRDF such indeterminate orderings should >>>>be defined consistently >>>> That you have multiple terms in the data that compare to be >>>>equivalent, in this case we're at the mercy of .Net's sort >>>>implementation for which items float to the top and so are returned >>>>each time >>>> >>>> GRAPH ?var can be quite expensive because what it does is evaluate >>>>the inner operations over each individual named graph in the dataset >>>>in turn. Where ?var is already bound this might be a small subset but >>>>given the structure of your query I suspect there are at least some >>>>places where this is happening. So with two points in your query >>>>where you have GRAPH ?var being potentially unbound (or bound to a >>>>large number of possible values) you would get the O(n2) exponential >>>>scaling behaviour you describe >>>> >>>> Also the ?s ?p ?o in the start of your first GRAPH clause may be >>>>causing a substantial increase in intermediate results early on in the >>>>query. It might be better to have a separate GRAPH clause after the >>>>first GRAPH clause to pull out all the triples once you've determined >>>>the graphs you actually care about. >>>> >>>> There is of course a possibility that dotNetRDF is optimising the >>>>query badly but that will require some debugging to figure out if this >>>>is the case. >>>> >>>> Using the ExplainQueryProcessor >>>>(http://www.dotnetrdf.org/api/index.asp?Topic=VDS.RDF.Query.ExplainQuer >>>>yProcessor) with the ExplanationLevel turned up to Full as described >>>>at >>>>https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/HowTo/Debug%20SPARQL%20Q >>>>ueries.wiki#!debugging-sparql-queries might be enlightening since >>>>it'll include things like intermediate result count. Though it >>>>doesn't currently analyse how many graphs a given GRAPH clause has to >>>>consider which it'll make it hard to spot that exponential looping on >>>>GRAPH ?var if that is the culprit, that would certainly be interesting >>>>information so I may try and add that in the future. >>>> >>>> Let me know if you guys figure anything more out, I'll aim to take a >>>>proper look and debug this later in the week >>>> >>>> Cheers, >>>> >>>> Rob >>>> >>>> From: Tomek Pluskiewicz <to...@pl...> >>>> Reply-To: dotNetRDF Bug Report tracking and resolution >>>><dot...@li...> >>>> Date: Wednesday, 21 May 2014 13:46 >>>> To: dotNetRDF Bug Report tracking and resolution >>>><dot...@li...> >>>> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >>>> >>>> Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test >>>> >>>> >>>> On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz >>>><to...@pl...> wrote: >>>>> >>>>> Hi Rob >>>>> >>>>> We've developing a ORM solution complete with Linq for some time >>>>>now. Will be open source'd at some point. Currently we've been >>>>>experiencing problems with query speed and reliability. Let me >>>>>acquaint you with how things work. >>>>> >>>>> Each resource is contained within its own named graph and >>>>>additionally there is a meta-graph, which connects graphs and the >>>>>described entities (there could be many graphs for one resource). For >>>>>example >>>>> >>>>> # meta graph >>>>> <http://foo.com/productList/> >>>>> { >>>>> ex:Wrench1 foaf:primaryTopic ex:Wrench1 . >>>>> } >>>>> >>>>> # wrench >>>>> ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . } >>>>> >>>>> The problem is with a query >>>>> >>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >>>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>>>> PREFIX schema: <http://schema.org/> >>>>> PREFIX foaf: <http://xmlns.com/foaf/0.1/> >>>>> >>>>> SELECT ?s ?p ?o ?Gp0 ?p0 >>>>> WHERE >>>>> { >>>>> GRAPH ?Gp0 >>>>> { >>>>> ?s ?p ?o . >>>>> ?p0_sub schema:name ?name0_sub . >>>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>>>> ?p0 rdf:type schema:Product . >>>>> { >>>>> SELECT DISTINCT ?p0_sub >>>>> WHERE >>>>> { >>>>> GRAPH ?Gp0_sub >>>>> { >>>>> ?p0_sub rdf:type schema:Product . >>>>> ?p0_sub schema:name ?name0_sub . >>>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>>>> } >>>>> GRAPH <http://foo.com/productList/> >>>>> { >>>>> ?Gp0_sub foaf:primaryTopic ?p0_sub . >>>>> } >>>>> } >>>>> #ORDER BY ?p0_sub >>>>> LIMIT 2 >>>>> } >>>>> FILTER(?p0_sub=?p0) >>>>> } >>>>> >>>>> GRAPH <http://foo.com/productList/> >>>>> { >>>>> ?Gp0 foaf:primaryTopic ?p0 . >>>>> } >>>>> } >>>>> >>>>> transformed from the following Linq >>>>> >>>>> Query<IProduct>().Where(p => >>>>>p.Name.ToUpper().Contains(name.ToUpper())).Take(2) >>>>> >>>>> There are two problems here. The query returns different results on >>>>>subsequent runs against the same dataset and it runs very slow. >>>>>Uncommenting the ORDER BY helps with the varying result count though >>>>>I'm not exactly sure why it should be necessary. However I'm not sure >>>>>what's with performance. Obviously it has something to do with the >>>>>subquery but I was unable to alter this SELECT so that it executed >>>>>quickly. Even as small a dataset as 9 quads (3 resources * (2 triples >>>>>+ 1 meta-triple)) takes 1 second to complete and the time seems to >>>>>increase exponentially. At 90 quads/30 graphs it is already taking >>>>>close to 3 minutes. >>>>> >>>>> We've first observed the performance problems with version 1.0.4 but >>>>>with a synthetic dataset the same issues arise in previous releases >>>>>and 1.0.5+. >>>>> >>>>> Hope you can help. Would you like any additional info? >>>>> >>>>> Regards, >>>>> Tom >>>> >>>> >>>> >>>>----------------------------------------------------------------------- >>>>------- "Accelerate Dev Cycles with Automated Cross-Browser Testing - >>>>For FREE Instantly run your Selenium tests across 300+ browser/OS >>>>combos. Get unparalleled scalability from the best Selenium testing >>>>platform available Simple to use. Nothing to install. Get started now >>>>for free." >>>>http://p.sf.net/sfu/SauceLabs__________________________________________ >>>>_____ dotNetRDF-bugs mailing list >>>>dot...@li...https://lists.sourceforge.net/lists >>>>/listinfo/dotnetrdf-bugs >>>> >>>> >>>> >>>>----------------------------------------------------------------------- >>>>------- >>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>> Instantly run your Selenium tests across 300+ browser/OS combos. >>>> Get unparalleled scalability from the best Selenium testing platform >>>>available >>>> Simple to use. Nothing to install. Get started now for free." >>>> http://p.sf.net/sfu/SauceLabs >>>> _______________________________________________ >>>> dotNetRDF-bugs mailing list >>>> dot...@li... >>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>> >>> >>> >>>------------------------------------------------------------------------ >>>------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - >>>For FREE Instantly run your Selenium tests across 300+ browser/OS >>>combos. Get unparalleled scalability from the best Selenium testing >>>platform available Simple to use. Nothing to install. Get started now >>>for free." >>>http://p.sf.net/sfu/SauceLabs___________________________________________ >>>____ dotNetRDF-bugs mailing list >>>dot...@li...https://lists.sourceforge.net/lists/ >>>listinfo/dotnetrdf-bugs >>> >>> >>> >>>------------------------------------------------------------------------ >>>------ >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>> Instantly run your Selenium tests across 300+ browser/OS combos. >>> Get unparalleled scalability from the best Selenium testing platform >>>available >>> Simple to use. Nothing to install. Get started now for free." >>> http://p.sf.net/sfu/SauceLabs >>> _______________________________________________ >>> dotNetRDF-bugs mailing list >>> dot...@li... >>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>> >> >>------------------------------------------------------------------------- >>----- "Accelerate Dev Cycles with Automated Cross-Browser Testing - For >>FREE Instantly run your Selenium tests across 300+ browser/OS combos. >>Get unparalleled scalability from the best Selenium testing platform >>available Simple to use. Nothing to install. Get started now for free." >>http://p.sf.net/sfu/SauceLabs____________________________________________ >>___ dotNetRDF-bugs mailing list dot...@li... >>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >> >> >>------------------------------------------------------------------------- >>----- >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >>available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> dotNetRDF-bugs mailing list >> dot...@li... >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> > >-------------------------------------------------------------------------- >---- >"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >Instantly run your Selenium tests across 300+ browser/OS combos. >Get unparalleled scalability from the best Selenium testing platform >available >Simple to use. Nothing to install. Get started now for free." >http://p.sf.net/sfu/SauceLabs >_______________________________________________ >dotNetRDF-bugs mailing list >dot...@li... >https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |