|
From: Tomek P. <to...@pl...> - 2014-05-22 15:18:54
|
I tried 1.0.4 and 1.0.5-pre2 and both are equally slow. Tom On May 22, 2014 4:40 PM, "Rob Vesse" <rv...@do...> wrote: > Tom > > Are you saying that performance is substantially worse with 1.0.4 versus > 1.0.3 or the performance is just as bad across all recent releases? > > Rob > > From: Tomasz Pluskiewicz <tom...@gm...> > Reply-To: dotNetRDF Bug Report tracking and resolution < > dot...@li...> > Date: Thursday, 22 May 2014 14:48 > To: dotNetRDF Bug Report tracking and resolution < > dot...@li...> > Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries > > Rob, thanks for responding. > > Always +1 for additional diagnostic tools (I mean the ExplainProcessor > enhancement). > > I've been fiddling with our query and the ?s ?p ?o pattern seems to have > little but noticeable impact on the synthetic dataset. But indeed moving > the subquery as-is outside the first GRAPH ?var boosts the query by an > order of magnitude. I've also tried to remove the duplicate triple patterns > on both GRAPH ?v patterns but it doesn't help much either. Interestingly a > query which combines subquery moved, ?s ?p ?o extracted and duplicate > triple patters removed is significantly slower then the one with just > subquery moved outside the GRAPH ?var. > > I've ran all kinds of queries against our real-life data (20k quads in > over 900 graphs) and the conclusions are the same. Moving subquery and ?s > ?p ?o graph pattern gives best results. > > Regarding the ORDER BY it still seems like a bug. I wanted to blame > inconsistent results on the fact that the subquery is nested inside the > GRAPH ?var but with the subquery moved I observe the same bahaviour. > > All the above is true for 1.0.3. Now regarding 1.0.4+ there are additional > problems as I wrote yesterday. With the real-life data the original query > takes over 2.5 minutes to complete, while in previous version only about a > quarter of a second is needed! The optimized queries actually took so long > that I never had them finished. > > Tom > > > On Wed, May 21, 2014 at 3:47 PM, Rob Vesse <rv...@do...> wrote: > >> Tom >> >> Thanks for the report, I haven't done any debugging yet but I have a few >> thoughts based on what you've described >> >> ORDER BY causing indeterminate results could be a bug but it also could >> just be an artefact of two things: >> >> 1. SPARQL only defines a partial ordering so there are some >> combinations of terms for which ordering is left to the implementation >> though since we're just talking about dotNetRDF such indeterminate >> orderings should be defined consistently >> 2. That you have multiple terms in the data that compare to be >> equivalent, in this case we're at the mercy of .Net's sort implementation >> for which items float to the top and so are returned each time >> >> GRAPH ?var can be quite expensive because what it does is evaluate the >> inner operations over each individual named graph in the dataset in turn. >> Where ?var is already bound this might be a small subset but given the >> structure of your query I suspect there are at least some places where this >> is happening. So with two points in your query where you have GRAPH ?var >> being potentially unbound (or bound to a large number of possible values) >> you would get the O(n2) exponential scaling behaviour you describe >> >> Also the ?s ?p ?o in the start of your first GRAPH clause may be causing >> a substantial increase in intermediate results early on in the query. It >> might be better to have a separate GRAPH clause after the first GRAPH >> clause to pull out all the triples once you've determined the graphs you >> actually care about. >> >> There is of course a possibility that dotNetRDF is optimising the query >> badly but that will require some debugging to figure out if this is the >> case. >> >> Using the ExplainQueryProcessor ( >> http://www.dotnetrdf.org/api/index.asp?Topic=VDS.RDF.Query.ExplainQueryProcessor) >> with the ExplanationLevel turned up to Full as described at >> https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/HowTo/Debug%20SPARQL%20Queries.wiki#!debugging-sparql-queries might >> be enlightening since it'll include things like intermediate result count. >> Though it doesn't currently analyse how many graphs a given GRAPH clause >> has to consider which it'll make it hard to spot that exponential looping >> on GRAPH ?var if that is the culprit, that would certainly be interesting >> information so I may try and add that in the future. >> >> Let me know if you guys figure anything more out, I'll aim to take a >> proper look and debug this later in the week >> >> Cheers, >> >> Rob >> >> From: Tomek Pluskiewicz <to...@pl...> >> Reply-To: dotNetRDF Bug Report tracking and resolution < >> dot...@li...> >> Date: Wednesday, 21 May 2014 13:46 >> To: dotNetRDF Bug Report tracking and resolution < >> dot...@li...> >> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >> >> Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test >> >> >> On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz <to...@pl... >> > wrote: >> >>> Hi Rob >>> >>> We've developing a ORM solution complete with Linq for some time now. >>> Will be open source'd at some point. Currently we've been experiencing >>> problems with query speed and reliability. Let me acquaint you with how >>> things work. >>> >>> Each resource is contained within its own named graph and additionally >>> there is a meta-graph, which connects graphs and the described entities >>> (there could be many graphs for one resource). For example >>> >>> # meta graph >>> <http://foo.com/productList/> >>> { >>> ex:Wrench1 foaf:primaryTopic ex:Wrench1 . >>> } >>> >>> # wrench >>> ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . } >>> >>> The problem is with a query >>> >>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>> PREFIX schema: <http://schema.org/> >>> PREFIX foaf: <http://xmlns.com/foaf/0.1/> >>> >>> SELECT ?s ?p ?o ?Gp0 ?p0 >>> WHERE >>> { >>> GRAPH ?Gp0 >>> { >>> ?s ?p ?o . >>> ?p0_sub schema:name ?name0_sub . >>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>> ?p0 rdf:type schema:Product . >>> { >>> SELECT DISTINCT ?p0_sub >>> WHERE >>> { >>> GRAPH ?Gp0_sub >>> { >>> ?p0_sub rdf:type schema:Product . >>> ?p0_sub schema:name ?name0_sub . >>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>> } >>> GRAPH <http://foo.com/productList/> >>> { >>> ?Gp0_sub foaf:primaryTopic ?p0_sub . >>> } >>> } >>> #ORDER BY ?p0_sub >>> LIMIT 2 >>> } >>> FILTER(?p0_sub=?p0) >>> } >>> >>> GRAPH <http://foo.com/productList/> >>> { >>> ?Gp0 foaf:primaryTopic ?p0 . >>> } >>> } >>> >>> transformed from the following Linq >>> >>> Query<IProduct>().Where(p => >>> p.Name.ToUpper().Contains(name.ToUpper())).Take(2) >>> >>> There are two problems here. The query returns different results on >>> subsequent runs against the same dataset and it runs very slow. >>> Uncommenting the ORDER BY helps with the varying result count though >>> I'm not exactly sure why it should be necessary. However I'm not sure >>> what's with performance. Obviously it has something to do with the subquery >>> but I was unable to alter this SELECT so that it executed quickly. Even >>> as small a dataset as 9 quads (3 resources * (2 triples + 1 meta-triple)) >>> takes 1 second to complete and the time seems to increase exponentially. At >>> 90 quads/30 graphs it is already taking close to 3 minutes. >>> >>> We've first observed the performance problems with version 1.0.4 but >>> with a synthetic dataset the same issues arise in previous releases and >>> 1.0.5+. >>> >>> Hope you can help. Would you like any additional info? >>> >>> Regards, >>> Tom >>> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. Get >> unparalleled scalability from the best Selenium testing platform available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs_______________________________________________dotNetRDF-bugs mailing list >> dot...@li... >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >> available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> dotNetRDF-bugs mailing list >> dot...@li... >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >> > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > |