From: Bryan T. <br...@sy...> - 2014-11-06 17:01:12
|
What happens if you replace that last line with: ORDER BY ?string_label rather than ORDER BY STR(?string_label) Remember, it is assuming that the ORDER BY is using simple variables. Bryan On Thu, Nov 6, 2014 at 11:58 AM, Jim Balhoff <ba...@ne...> wrote: > Here is the exact query (with or without DISTINCT) for the linked results: > > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > PREFIX owl: <http://www.w3.org/2002/07/owl#> > > SELECT DISTINCT ?term ?string_label > WHERE > { > ?term rdf:type owl:Class . > ?term rdfs:label ?term_label . > BIND (STR(?term_label) AS ?string_label) > } > ORDER BY STR(?string_label) > > > Results (same number of rows either way): > SELECT DISTINCT: > explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > > SELECT: > explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > > You can diff the two results files to see the out-of-order blocks. > > I suppose it does look like the DISTINCT query plan has ORDER BY applied > before DISTINCT, if I am reading it right. > > Thanks, > Jim > > > > > > On Nov 6, 2014, at 10:10 AM, Bryan Thompson <br...@sy...> wrote: > > > > Jim, > > > > 502 is about support for expressions (other than simple variables in > ORDER_BY). > > > > If there is an issue with DISTINCT + ORDER_BY then this would be a new > ticket. > > > > Just post the EXPLAIN (attach to the email) for the moment. I want to > see how this is being generated. We should then check the specification > and make sure that the correct behavior is DISTINCT followed by ORDER BY > with any limit applied after the ORDER BY. I can then check the code for > how we are handling this. > > > > The relevant logic is in AST2BOpUtility at line 451. You can see that > it is already attempting to handle this and that there was a historical > ticket for this issue (#563). > > > > > > > > /* > > > > * Note: The DISTINCT operators also enforce the projection. > > > > * > > > > * Note: REDUCED allows, but does not require, either > complete or > > > > * partial filtering of duplicates. It is part of what > openrdf does > > > > * for a DESCRIBE query. > > > > * > > > > * Note: We do not currently have special operator for > REDUCED. One > > > > * could be created using chunk wise DISTINCT. Note that > REDUCED may > > > > * not change the order in which the solutions appear (but > we are > > > > * evaluating it before ORDER BY so that is Ok.) > > > > * > > > > * TODO If there is an ORDER BY and a DISTINCT then the sort > can be > > > > * used to impose the distinct without the overhead of a > hash index > > > > * by filtering out the duplicate solutions after the sort. > > > > */ > > > > > > > > // When true, DISTINCT must preserve ORDER BY ordering. > > > > final boolean preserveOrder; > > > > > > > > if (orderBy != null && !orderBy.isEmpty()) { > > > > > > > > /* > > > > * Note: ORDER BY before DISTINCT, so DISTINCT must > preserve > > > > * order. > > > > * > > > > * @see > https://sourceforge.net/apps/trac/bigdata/ticket/563 > > > > * (ORDER BY + DISTINCT) > > > > */ > > > > > > preserveOrder = true; > > > > > > > > left = addOrderBy(left, queryBase, orderBy, ctx); > > > > > > > > } else { > > > > > > preserveOrder = false; > > > > > > } > > > > > > > > if (projection.isDistinct() || projection.isReduced()) { > > > > > > > > left = addDistinct(left, queryBase, preserveOrder, ctx); > > > > > > > > } > > > > > > > > } else { > > > > > > > > /* > > > > * TODO Under what circumstances can the projection be > [null]? > > > > */ > > > > > > if (orderBy != null && !orderBy.isEmpty()) { > > > > > > > > left = addOrderBy(left, queryBase, orderBy, ctx); > > > > > > > > } > > > > > > > > } > > > > > > > > Bryan > > > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://bigdata.com > > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > > > > > > On Thu, Nov 6, 2014 at 10:03 AM, Jim Balhoff <ba...@ne...> > wrote: > > Hi Bryan, > > > > Just to clarify, would you like me to attach the info to ticket 502, or > continue posting to the developer list? > > > > Thanks, > > Jim > > > > > > > On Nov 6, 2014, at 8:28 AM, Bryan Thompson <br...@sy...> wrote: > > > > > > The ticket for allowing aggregates in ORDER BY is: > > > > > > - http://trac.bigdata.com/ticket/502 (Allow aggregates in ORDER BY > clause) > > > > > > Can you attach the EXPLAIN of the query with and without DISTINCT. > The issue may be that the DISTINCT is being applied after the ORDER BY. I > seem to remember some issue historically with operations being performed > before/after the ORDER BY, but I do not have any distinct recollection of a > problematic interaction between DISTINCT and ORDER BY. > > > > > > Bryan > > > > > > ---- > > > Bryan Thompson > > > Chief Scientist & Founder > > > SYSTAP, LLC > > > 4501 Tower Road > > > Greensboro, NC 27410 > > > br...@sy... > > > http://bigdata.com > > > http://mapgraph.io > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments > are for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > > > > > > > > > > On Wed, Nov 5, 2014 at 6:14 PM, Jim Balhoff <ba...@ne...> > wrote: > > > > On Nov 5, 2014, at 5:46 PM, Jeremy J Carroll <jj...@sy...> wrote: > > > > > > > > > > > >> On Nov 5, 2014, at 1:02 PM, Bryan Thompson <br...@sy...> > wrote: > > > >> > > > >> There could be an issue with ORDER BY operating on an anonymous and > non-projected variable. Try declaring and binding a variable for > STR(?label) inside of the query and then using that variable in the ORDER > BY clause. > > > > > > > > > > > > Yes I tend to find the results of ORDER BY are more what I expect if > I do not include an expression in the ORDER BY but simply variables. I BIND > any expression before the ORDER BY. > > > > > > > > I believe there is a trac item for this, but since the workaround is > easy, I have never seen it as high priority > > > > > > > > > > As suggested I tried binding a variable as `BIND (STR(?term_label) AS > ?string_label)` and using that to sort. Still incorrect ordering. But, I > tried removing DISTINCT, and then the ordering is correct. Even going back > to the anonymous `ORDER BY STR(?term_label)`, ordering is still correct if > I remove DISTINCT. For this specific query DISTINCT is not needed, but I do > need it for my application. Is there a reason to not expect DISTINCT to > work correctly with ORDER BY? > > > > > > Thanks both of you for all of your help, > > > Jim > > > > > > > > > > > > |