From: <tr...@do...> - 2013-03-27 23:48:36
|
<p>A new comment has been added to the following issue.</p> <table border="0"> <tr> <td width="90px" valign="top"><b>Title:</b></td> <td>Multi-variable joins can yield incorrect results</td> </tr> <tr> <td><b>Project:</b></td> <td>Core Library (dotNetRDF.dll)</td> </tr> <tr> <td><b>Created By:</b></td> <td>Rob Vesse</td> </tr> <tr> <td><b>Date:</b></td> <td>2013-03-27 11:47 PM</td> </tr> <tr> <td><b>Comment:</b></td> </tr> <tr> <td colspan="2"><p> Here's a better explanation of what the problem is, the title suggests that the problem was with the join but as it turns out the join code is entirely correct. The problem was that the join code was being fed duplicate data due to a bug in the code that scans the graph.</p> <p> </p> <p> The graph scanning code tries to make the most restrictive lookup possible so it will take the solutions seen up to the current point in the BGP it is evaluating and use those to make multiple more specific lookups rather than a less general lookup. This is done in order to improve performance and avoid scanning irrelevant data. Unfortuatenly as it turns out there was a bug in the code for the case when the triple pattern has two variables and both those variables are already bound, in that case the scan was supposed to lookup each unique possible pair of values for the variables but the uniqueness constraint wasn't correctly implemented so it was looking up every possible pair some arbitrary number of times. Due to the structure of the data and the query this caused the users query to explode the solution space, because there are two patterns in the BGP that experience this issue it explodes the solution space twice leading to the vastly incorrect number of solutions.</p> <p> </p> <p> The adding of a DISTINCT as described in the users question caused the engine to be able to eliminate the spurious solutions because the lack of distinctness in the scan code was leading to the introduction of many unecessary duplicates</p></td> </tr> </table> <p> More information on this issue can be found at <a href="http://www.dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=343" target="_blank">http://www.dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=343</a></p> <p style="text-align:center;font-size:8pt;padding:5px;"> If you no longer wish to receive notifications, please visit <a href="http://www.dotnetrdf.org/tracker/Account/UserProfile.aspx" target="_blank">your profile</a> and change your notifications options. </p> |