From: <tr...@do...> - 2013-07-10 19:50:33
|
<p>The following issue has been added to a project that you are monitoring.</p> <table border="0"> <tr> <td width="90px" valign="top"><b>Title:</b></td> <td>Add some "lazyness" to algebra evaluation</td> </tr> <tr> <td><b>Project:</b></td> <td>Core Library (dotNetRDF.dll)</td> </tr> <tr> <td><b>Created By:</b></td> <td>Max</td> </tr> <tr> <td><b>Milestone:</b></td> <td>Unassigned</td> </tr> <tr> <td><b>Category:</b></td> <td>Query</td> </tr> <tr> <td><b>Priority:</b></td> <td>Low</td> </tr> <tr> <td><b>Type:</b></td> <td>Improvement</td> </tr> <tr> <td><b>Description:</b></td> </tr> <tr> <td colspan="2"><p> Hi Rob,<br /> <br /> I just encountered this use-case on a store I'm testing.</p> <p> The store currently amounts to ~140K triples and comprise 3000 organisations<br /> I made this query to get pages of 10 organisations, and added bindings to check when to activate tabs depending on whether or not the organisation authored a specific item type:<br /> <br /> SELECT DISTINCT ?org ?hasBlogPost ?hasEvent ?hasJobOffer ?hasProduct ?hasGoodDeal ?isWebSite {<br /> FILTER (NOT EXISTS{ ?org a my:PortalSite }) .<br /> BIND ( EXISTS { ?item a my:BlogPost . ?item my:authoredBy ?org} as ?hasBlogPost) .<br /> BIND ( EXISTS { ?item a my:Event. ?item my:authoredBy ?org} as ?hasEvent) .<br /> BIND ( EXISTS { ?item a my:JobOffer. ?item my:authoredBy ?org} as ?hasJobOffer) .<br /> BIND ( EXISTS { ?item a my:ProductOrService . ?item my:authoredBy ?org} as ?hasProduct) .<br /> BIND ( EXISTS { ?item a my:GoodDeal . ?item my:authoredBy ?org} as ?hasGoodDeal) .<br /> BIND (EXISTS{?org a my:PersonalSite } as ?isWebSite)<br /> . ?org rdfs:label ?lbl . ?org a my:Organization<br /> } ORDER BY DESC(?isWebSite) ASC(?lbl) LIMIT 10<br /> <br /> As it is the optimizer converts the query to this algebra:<br /> <br /> Slice(Distinct(Select(OrderBy(Extend(Extend(Extend(Extend(Extend(Join(Extend(Filter(BGP(?result <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.example.com/schema#Organization>), NOT EXISTS { ?result <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.example.com/schema#PortalSite> . }))), BGP(?result <http://www.w3.org/2000/01/rdf-schema#label> ?var12)))))))))), LIMIT 10, OFFSET 0)<br /> <br /> So of course, all of the Extend algebras get evaluated for the whole dataset before ordering/slicing/distincting.<br /> <br /> I was wondering if it would not be too difficult to make the engine evaluate these algebras later as they are really needed since they do not take any part in filtering the result set ?<br /> <br /> In my case the query "plan" would become something like :<br /> <br /> Select(Extend(Extend(Extend(Extend(Extend(Slice(Distinct(OrderBy(Join(Extend(Filter(BGP(?result <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.example.com/schema#Organization>), NOT EXISTS { ?result <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.example.com/schema#PortalSite> . }))), BGP(?result <http://www.w3.org/2000/01/rdf-schema#label> ?var12)))), LIMIT 10, OFFSET 0)))))))<br /> <br /> Meaning that all the bindings (except for the isWebSite var that is required by the OrderBy clause) are avaluate once the resultset is computed, resulting in much less evaluations if I'm right.<br /> <br /> I took the Extend algebra example since it is the main pattern in my query but I believe it could (perhaps already is?) be extended to some other algebra ?<br /> <br /> I'm not sure whether or not the performance gain is worth the effort, but I think it worth the consideration anyway ;)<br /> <br /> Max.</p></td> </tr> </table> <p> More information on this issue can be found at <a href="http://www.dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=369" target="_blank">http://www.dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=369</a></p> <p style="text-align:center;font-size:8pt;padding:5px;"> If you no longer wish to receive notifications, please visit <a href="http://www.dotnetrdf.org/tracker/Account/UserProfile.aspx" target="_blank">your profile</a> and change your notifications options. </p> |