This list is closed, nobody may subscribe to it.
2010 |
Jan
|
Feb
(19) |
Mar
(8) |
Apr
(25) |
May
(16) |
Jun
(77) |
Jul
(131) |
Aug
(76) |
Sep
(30) |
Oct
(7) |
Nov
(3) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(16) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(7) |
2012 |
Jan
(10) |
Feb
(1) |
Mar
(8) |
Apr
(6) |
May
(1) |
Jun
(3) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(8) |
Dec
(2) |
2013 |
Jan
(5) |
Feb
(12) |
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(22) |
Aug
(50) |
Sep
(31) |
Oct
(64) |
Nov
(83) |
Dec
(28) |
2014 |
Jan
(31) |
Feb
(18) |
Mar
(27) |
Apr
(39) |
May
(45) |
Jun
(15) |
Jul
(6) |
Aug
(27) |
Sep
(6) |
Oct
(67) |
Nov
(70) |
Dec
(1) |
2015 |
Jan
(3) |
Feb
(18) |
Mar
(22) |
Apr
(121) |
May
(42) |
Jun
(17) |
Jul
(8) |
Aug
(11) |
Sep
(26) |
Oct
(15) |
Nov
(66) |
Dec
(38) |
2016 |
Jan
(14) |
Feb
(59) |
Mar
(28) |
Apr
(44) |
May
(21) |
Jun
(12) |
Jul
(9) |
Aug
(11) |
Sep
(4) |
Oct
(2) |
Nov
(1) |
Dec
|
2017 |
Jan
(20) |
Feb
(7) |
Mar
(4) |
Apr
(18) |
May
(7) |
Jun
(3) |
Jul
(13) |
Aug
(2) |
Sep
(4) |
Oct
(9) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Rose B. <ros...@gm...> - 2014-11-13 11:35:28
|
I need the total query evaluation time (i.e. till the last result is obtained by the client). On Thu, Nov 13, 2014 at 5:03 PM, Bryan Thompson <br...@sy...> wrote: > Benchmarks normally report the total time for the client to obtain the > result or the time until the friar result (for unusual cases such as open > web query) rather than the time for the database to execute the query. > > If I recall you are attempting to find: > > - the query optimizer time > - the query evaluation time (total, or u till first result?) > - anything else? > > What is the purpose of this benchmark? > > Thanks, > Bryan > > On Thursday, November 13, 2014, Rose Beck <ros...@gm...> wrote: >> >> Yes I understand. I need to run multiple queries and report their >> total query execution time. I know I can do it with bigdata workbench >> as it does give the total query execution time. But I am running 3000+ >> queries, so it is becoming impossible for me to note their query >> execution time individually through bigdata workbench as it runs >> within the browser. Is there some way: that using CURL I may do the >> same on command line --- an example will be really helpful. >> >> On Thu, Nov 13, 2014 at 4:49 PM, Bryan Thompson <br...@sy...> wrote: >> > Rose, >> > >> > Those are URL query parameters. They are appended onto the end of the >> > SPARQL endpoint URL. >> > >> > http://.../sparql?explain >> > >> > If there are multiple URL query parameters then the next one is >> > introduced >> > with an ampersand &. If there is a value associated with a parameter >> > then >> > it is ?foo=bar or &foo=bar. This is standard http. >> > >> > The explain gives the time to run the query but not the time to send the >> > results to the client. The client (the workbench) is probably reporting >> > the >> > total time until the query results are fully materialized on the client. >> > >> > The explain output is explained on the performance optimization pages on >> > the >> > wiki. >> > >> > Thanks, >> > Bryan >> > >> > On Thursday, November 13, 2014, Rose Beck <ros...@gm...> wrote: >> >> >> >> As stated at http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer >> >> one can pass "explain" as a parameter to SPARQL query passed in CURL. >> >> >> >> curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode >> >> 'query=SELECT * { ?s ?p ?o } LIMIT 1' -H 'Accept:application/rdf+xml' >> >> >> >> However, if I use the following (which works perfectly) well for >> >> Virtuoso, I get an error: >> >> curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode >> >> 'query=explain 'SELECT * { ?s ?p ?o } LIMIT 1'' -H >> >> 'Accept:application/rdf+xml' >> >> >> >> Also "explain" on Bigdata workbench when executed in a browser gives >> >> elapsed time of the time given below: >> >> solutions=2, chunks=1, subqueries=0, elapsed=32ms. >> >> >> >> Is the elapsed time same as the query execution time -- if not then >> >> what is the difference? Because on the browser bigdata reports >> >> different elapsed time as well as query execution time. >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Comprehensive Server Monitoring with Site24x7. >> >> Monitor 10 servers for $9/Month. >> >> Get alerted through email, SMS, voice calls or mobile push >> >> notifications. >> >> Take corrective actions from your mobile device. >> >> >> >> >> >> http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk >> >> _______________________________________________ >> >> Bigdata-developers mailing list >> >> Big...@li... >> >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >> > >> > >> > >> > -- >> > ---- >> > Bryan Thompson >> > Chief Scientist & Founder >> > SYSTAP, LLC >> > 4501 Tower Road >> > Greensboro, NC 27410 >> > br...@sy... >> > http://bigdata.com >> > http://mapgraph.io >> > >> > CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> > for >> > the sole use of the intended recipient(s) and are confidential or >> > proprietary to SYSTAP. Any unauthorized review, use, disclosure, >> > dissemination or copying of this email or its contents or attachments is >> > prohibited. If you have received this communication in error, please >> > notify >> > the sender by reply email and permanently delete all copies of the email >> > and >> > its contents and attachments. >> > >> > > > > > -- > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://bigdata.com > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for > the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email and > its contents and attachments. > > |
From: Bryan T. <br...@sy...> - 2014-11-13 11:33:42
|
Benchmarks normally report the total time for the client to obtain the result or the time until the friar result (for unusual cases such as open web query) rather than the time for the database to execute the query. If I recall you are attempting to find: - the query optimizer time - the query evaluation time (total, or u till first result?) - anything else? What is the purpose of this benchmark? Thanks, Bryan On Thursday, November 13, 2014, Rose Beck <ros...@gm...> wrote: > Yes I understand. I need to run multiple queries and report their > total query execution time. I know I can do it with bigdata workbench > as it does give the total query execution time. But I am running 3000+ > queries, so it is becoming impossible for me to note their query > execution time individually through bigdata workbench as it runs > within the browser. Is there some way: that using CURL I may do the > same on command line --- an example will be really helpful. > > On Thu, Nov 13, 2014 at 4:49 PM, Bryan Thompson <br...@sy... > <javascript:;>> wrote: > > Rose, > > > > Those are URL query parameters. They are appended onto the end of the > > SPARQL endpoint URL. > > > > http://.../sparql?explain > > > > If there are multiple URL query parameters then the next one is > introduced > > with an ampersand &. If there is a value associated with a parameter > then > > it is ?foo=bar or &foo=bar. This is standard http. > > > > The explain gives the time to run the query but not the time to send the > > results to the client. The client (the workbench) is probably reporting > the > > total time until the query results are fully materialized on the client. > > > > The explain output is explained on the performance optimization pages on > the > > wiki. > > > > Thanks, > > Bryan > > > > On Thursday, November 13, 2014, Rose Beck <ros...@gm... > <javascript:;>> wrote: > >> > >> As stated at http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer > >> one can pass "explain" as a parameter to SPARQL query passed in CURL. > >> > >> curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode > >> 'query=SELECT * { ?s ?p ?o } LIMIT 1' -H 'Accept:application/rdf+xml' > >> > >> However, if I use the following (which works perfectly) well for > >> Virtuoso, I get an error: > >> curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode > >> 'query=explain 'SELECT * { ?s ?p ?o } LIMIT 1'' -H > >> 'Accept:application/rdf+xml' > >> > >> Also "explain" on Bigdata workbench when executed in a browser gives > >> elapsed time of the time given below: > >> solutions=2, chunks=1, subqueries=0, elapsed=32ms. > >> > >> Is the elapsed time same as the query execution time -- if not then > >> what is the difference? Because on the browser bigdata reports > >> different elapsed time as well as query execution time. > >> > >> > >> > ------------------------------------------------------------------------------ > >> Comprehensive Server Monitoring with Site24x7. > >> Monitor 10 servers for $9/Month. > >> Get alerted through email, SMS, voice calls or mobile push > notifications. > >> Take corrective actions from your mobile device. > >> > >> > http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk > >> _______________________________________________ > >> Bigdata-developers mailing list > >> Big...@li... <javascript:;> > >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > > > > > -- > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... <javascript:;> > > http://bigdata.com > > http://mapgraph.io > > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for > > the sole use of the intended recipient(s) and are confidential or > > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > > dissemination or copying of this email or its contents or attachments is > > prohibited. If you have received this communication in error, please > notify > > the sender by reply email and permanently delete all copies of the email > and > > its contents and attachments. > > > > > -- ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. |
From: Rose B. <ros...@gm...> - 2014-11-13 11:27:44
|
Yes I understand. I need to run multiple queries and report their total query execution time. I know I can do it with bigdata workbench as it does give the total query execution time. But I am running 3000+ queries, so it is becoming impossible for me to note their query execution time individually through bigdata workbench as it runs within the browser. Is there some way: that using CURL I may do the same on command line --- an example will be really helpful. On Thu, Nov 13, 2014 at 4:49 PM, Bryan Thompson <br...@sy...> wrote: > Rose, > > Those are URL query parameters. They are appended onto the end of the > SPARQL endpoint URL. > > http://.../sparql?explain > > If there are multiple URL query parameters then the next one is introduced > with an ampersand &. If there is a value associated with a parameter then > it is ?foo=bar or &foo=bar. This is standard http. > > The explain gives the time to run the query but not the time to send the > results to the client. The client (the workbench) is probably reporting the > total time until the query results are fully materialized on the client. > > The explain output is explained on the performance optimization pages on the > wiki. > > Thanks, > Bryan > > On Thursday, November 13, 2014, Rose Beck <ros...@gm...> wrote: >> >> As stated at http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer >> one can pass "explain" as a parameter to SPARQL query passed in CURL. >> >> curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode >> 'query=SELECT * { ?s ?p ?o } LIMIT 1' -H 'Accept:application/rdf+xml' >> >> However, if I use the following (which works perfectly) well for >> Virtuoso, I get an error: >> curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode >> 'query=explain 'SELECT * { ?s ?p ?o } LIMIT 1'' -H >> 'Accept:application/rdf+xml' >> >> Also "explain" on Bigdata workbench when executed in a browser gives >> elapsed time of the time given below: >> solutions=2, chunks=1, subqueries=0, elapsed=32ms. >> >> Is the elapsed time same as the query execution time -- if not then >> what is the difference? Because on the browser bigdata reports >> different elapsed time as well as query execution time. >> >> >> ------------------------------------------------------------------------------ >> Comprehensive Server Monitoring with Site24x7. >> Monitor 10 servers for $9/Month. >> Get alerted through email, SMS, voice calls or mobile push notifications. >> Take corrective actions from your mobile device. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > -- > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://bigdata.com > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for > the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email and > its contents and attachments. > > |
From: Bryan T. <br...@sy...> - 2014-11-13 11:20:06
|
Rose, Those are URL query parameters. They are appended onto the end of the SPARQL endpoint URL. http://.../sparql?explain If there are multiple URL query parameters then the next one is introduced with an ampersand &. If there is a value associated with a parameter then it is ?foo=bar or &foo=bar. This is standard http. The explain gives the time to run the query but not the time to send the results to the client. The client (the workbench) is probably reporting the total time until the query results are fully materialized on the client. The explain output is explained on the performance optimization pages on the wiki. Thanks, Bryan On Thursday, November 13, 2014, Rose Beck <ros...@gm...> wrote: > As stated at http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer > one can pass "explain" as a parameter to SPARQL query passed in CURL. > > curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode > 'query=SELECT * { ?s ?p ?o } LIMIT 1' -H 'Accept:application/rdf+xml' > > However, if I use the following (which works perfectly) well for > Virtuoso, I get an error: > curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode > 'query=explain 'SELECT * { ?s ?p ?o } LIMIT 1'' -H > 'Accept:application/rdf+xml' > > Also "explain" on Bigdata workbench when executed in a browser gives > elapsed time of the time given below: > solutions=2, chunks=1, subqueries=0, elapsed=32ms. > > Is the elapsed time same as the query execution time -- if not then > what is the difference? Because on the browser bigdata reports > different elapsed time as well as query execution time. > > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > > http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk > _______________________________________________ > Bigdata-developers mailing list > Big...@li... <javascript:;> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > -- ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. |
From: Rose B. <ros...@gm...> - 2014-11-13 10:04:23
|
As stated at http://wiki.bigdata.com/wiki/index.php/NanoSparqlServer one can pass "explain" as a parameter to SPARQL query passed in CURL. curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode 'query=SELECT * { ?s ?p ?o } LIMIT 1' -H 'Accept:application/rdf+xml' However, if I use the following (which works perfectly) well for Virtuoso, I get an error: curl -X POST http://localhost:8080/bigdata/sparql --data-urlencode 'query=explain 'SELECT * { ?s ?p ?o } LIMIT 1'' -H 'Accept:application/rdf+xml' Also "explain" on Bigdata workbench when executed in a browser gives elapsed time of the time given below: solutions=2, chunks=1, subqueries=0, elapsed=32ms. Is the elapsed time same as the query execution time -- if not then what is the difference? Because on the browser bigdata reports different elapsed time as well as query execution time. |
From: Bryan T. <br...@sy...> - 2014-11-12 18:42:27
|
Use dd to force the journal into the file system cache. Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Wed, Nov 12, 2014 at 1:38 PM, Jeremy J Carroll <jj...@sy...> wrote: > I have a production server with a 65G journal file (AWS EBS SSD, > encrypted) and a bigdata process of size 10G out of 16G total; c3.2xlarge > (we have the large heap size for in-memory sort) > > I need to upgrade bigdata to get the latest critical bug fixes, minimizing > effective downtime. > > I have a preference for 15 minutes downtime and then be totally ready over > 2 minutes downtime and then 5 minutes of sluggishness > > When we migrated to bigdata, we noticed that immediately after the > performance was sluggish (verging on the unacceptably sluggish) but that > things improved fairly quickly with use. > > I am wondering about whether to run a query that reads every triple say > (count the total number of triples?) to ‘warm things up’ > > Argument in favor of warm up, is that it sounds the right thing to do > Argument against is that if the key caching is actually at the OS level, > then the warm up would actually mess it all up, because the current OS > caches will be appropriate for the actual access patterns with real > queries, and I will replace it with caches for my artificial query: i.e. > stopping, upgrading and restarting bigdata will actually have very little > impact on the immediate performance because the important caches are not in > the bigdata process at all. > > Please advise > > Jeremy > > > > > > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > > http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
From: Jeremy J C. <jj...@sy...> - 2014-11-12 18:38:51
|
I have a production server with a 65G journal file (AWS EBS SSD, encrypted) and a bigdata process of size 10G out of 16G total; c3.2xlarge (we have the large heap size for in-memory sort) I need to upgrade bigdata to get the latest critical bug fixes, minimizing effective downtime. I have a preference for 15 minutes downtime and then be totally ready over 2 minutes downtime and then 5 minutes of sluggishness When we migrated to bigdata, we noticed that immediately after the performance was sluggish (verging on the unacceptably sluggish) but that things improved fairly quickly with use. I am wondering about whether to run a query that reads every triple say (count the total number of triples?) to ‘warm things up’ Argument in favor of warm up, is that it sounds the right thing to do Argument against is that if the key caching is actually at the OS level, then the warm up would actually mess it all up, because the current OS caches will be appropriate for the actual access patterns with real queries, and I will replace it with caches for my artificial query: i.e. stopping, upgrading and restarting bigdata will actually have very little impact on the immediate performance because the important caches are not in the bigdata process at all. Please advise Jeremy |
From: Ravi P. P. <rav...@fa...> - 2014-11-08 13:32:15
|
Thanks, that helps. Got stuck with running NanoSparqlServer on Ubuntu. Once I changed the jetty.port to NSS_PORT and exported this env variable it worked. Ravi On Monday 03 November 2014 05:12 PM, Bryan Thompson wrote: > Those sample code examples are for the embedded version of bigdata. > For HA, use the workbench and the SPARQL and SPARQL update endpoints. > > Bryan > > On Monday, November 3, 2014, Ravi Prakash Putchala > <rav...@fa... <mailto:rav...@fa...>> wrote: > > Hi, > > I am new to Bigdata and am trying to setup and use HAJournalServer. I > hope this is the mailing list to seek help regarding the usage of > Bigdata. Else please point me in the right direction. > > I configured 3 servers and installed zookeeper and HAJournalServer by > following the "Basic Deployment" section in the wikipage > HAJournalServer > (http://wiki.bigdata.com/wiki/index.php/HAJournalServer). Now I would > like to use this setup to load and query some data just like > bigdata-sails/src/samples/com/bigdata/samples/SampleCode.java does. > I just got stuck here and do not know how to connect to the cluster, > load, query etc. Could you please help by providing some pointers? > > I am using version 1.3.3. Please let me know if I need to provide more > information. > > Thank you. > > Regards, > > Ravi > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... <javascript:;> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > > -- > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... <mailto:br...@sy...> > http://bigdata.com > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments > are for the sole use of the intended recipient(s) and are confidential > or proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments > is prohibited. If you have received this communication in error, > please notify the sender by reply email and permanently delete all > copies of the email and its contents and attachments. > > |
From: Bryan T. <br...@sy...> - 2014-11-07 21:44:54
|
I see these as alternative approaches. You can also submit a Callable variant to the HAJournalServer using its RMI interface. See the HAGlue interface. This would allow you to submit a job that would then run the ExportKB logic (or similar logic) inside of the database. Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Nov 7, 2014 at 4:43 PM, Jeremy J Carroll <jj...@sy...> wrote: > So IIRC the method to avoid downtime is: > > - use HA to create online backup > - use ExportKB to convert to trig or trix > > > Jeremy > > > On Nov 7, 2014, at 12:04 PM, Bryan Thompson <br...@sy...> wrote: > > There is an ExportKB utility. It is described on the follow wiki page: > > > - DataMigration <http://wiki.bigdata.com/wiki/index.php/DataMigration> (A > page dedicated to data migration.) > > > See http://wiki.bigdata.com/wiki/index.php/DataMigration#Export > > The HAJournal supports online backup. This provides a compressed copy of > the journal that is consistent with the state of the journal as of the last > commit point when the backup was requested. This can be used in the HA1 > mode (without replication) to have online backups and incremental > transaction logs. The incremental transaction logs are the write set for > each commit point together with the opening and closing root block. The > HARestore utility may be used to decompress a given journal file and apply > the incremental logs in order to roll forward to a given commit point. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://bigdata.com > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Fri, Nov 7, 2014 at 12:44 PM, Jeremy J Carroll <jj...@sy...> wrote: > >> >> If we want a backup copy of the store, I can use various sparql construct >> calls to extract each named graph, but is there a simpler dump-to-trig >> option somehow? >> >> Jeremy >> >> >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >> > > > |
From: Jeremy J C. <jj...@sy...> - 2014-11-07 21:43:26
|
So IIRC the method to avoid downtime is: - use HA to create online backup - use ExportKB to convert to trig or trix Jeremy > On Nov 7, 2014, at 12:04 PM, Bryan Thompson <br...@sy...> wrote: > > There is an ExportKB utility. It is described on the follow wiki page: > > DataMigration <http://wiki.bigdata.com/wiki/index.php/DataMigration> (A page dedicated to data migration.) > > See http://wiki.bigdata.com/wiki/index.php/DataMigration#Export <http://wiki.bigdata.com/wiki/index.php/DataMigration#Export> > > The HAJournal supports online backup. This provides a compressed copy of the journal that is consistent with the state of the journal as of the last commit point when the backup was requested. This can be used in the HA1 mode (without replication) to have online backups and incremental transaction logs. The incremental transaction logs are the write set for each commit point together with the opening and closing root block. The HARestore utility may be used to decompress a given journal file and apply the incremental logs in order to roll forward to a given commit point. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... <mailto:br...@sy...> > http://bigdata.com <http://bigdata.com/> > http://mapgraph.io <http://mapgraph.io/> > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > On Fri, Nov 7, 2014 at 12:44 PM, Jeremy J Carroll <jj...@sy... <mailto:jj...@sy...>> wrote: > > If we want a backup copy of the store, I can use various sparql construct calls to extract each named graph, but is there a simpler dump-to-trig option somehow? > > Jeremy > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... <mailto:Big...@li...> > https://lists.sourceforge.net/lists/listinfo/bigdata-developers <https://lists.sourceforge.net/lists/listinfo/bigdata-developers> > |
From: Bryan T. <br...@sy...> - 2014-11-07 20:21:17
|
Excellent. I will close out the ticket. Let me know if you need any pointers to write up the test. The easiest place to start might be the QuadsTestCase class. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Nov 7, 2014 at 3:18 PM, Jim Balhoff <ba...@ne...> wrote: > Yes, I am getting the exact same ordering with and without DISTINCT. > > Thanks! > Jim > > > On Nov 7, 2014, at 3:09 PM, Bryan Thompson <br...@sy...> wrote: > > > > Jim, > > > > Did that file fix the issue for you? > > > > Thanks, > > Bryan > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://bigdata.com > > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > > > > > > On Fri, Nov 7, 2014 at 10:00 AM, Jim Balhoff <ba...@ne...> > wrote: > > I can give this a try—it will be good experience since I am not really > familiar with the Bigdata source code. So it may take me a little while. > > > > Thanks, > > Jim > > > > > > > On Nov 7, 2014, at 8:44 AM, Bryan Thompson <br...@sy...> wrote: > > > > > > Jim, > > > > > > Can you put together a unit test for this so we can avoid > regressions? It would need to have a sufficiently large data set to allow > the problem to be demonstrated. You would need to run both queries and > compare the resulting ordering. The data would have to be something that > could be committed into SVN, so with appropriate data rights and not too > large. But still large enough. > > > > > > Thanks, > > > Bryan > > > > > > ---- > > > Bryan Thompson > > > Chief Scientist & Founder > > > SYSTAP, LLC > > > 4501 Tower Road > > > Greensboro, NC 27410 > > > br...@sy... > > > http://bigdata.com > > > http://mapgraph.io > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments > are for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > > > > > > > > > > On Fri, Nov 7, 2014 at 8:37 AM, Bryan Thompson <br...@sy...> > wrote: > > > Jim. > > > > > > Ok. I was able to pull together the output of both queries into a > single worksheet and then compare the rows and mark the rows that were not > EQUALS and as such had a different ordering. > > > > > > I have created a ticket for this. See > http://trac.bigdata.com/ticket/1044. > > > > > > I would appreciate it if you could have gone a little further with > this and reduced the problem to something that clearly highlighted the > problem. I had to spend quite a bit of time trying to figure out why you > were seeing a problem in the output data. I could not spot any problem > myself until I put the data sets side-by-side in Excel and even then I had > to automate the comparison and then FILTER (in Excel) to find the rows > where the output differed. > > > > > > I think that I know the root cause. I will update the ticket shortly > and attach a file that you can test on your end for a fix. > > > > > > Thanks, > > > Bryan > > > > > > > > > ---- > > > Bryan Thompson > > > Chief Scientist & Founder > > > SYSTAP, LLC > > > 4501 Tower Road > > > Greensboro, NC 27410 > > > br...@sy... > > > http://bigdata.com > > > http://mapgraph.io > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments > are for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > > > > > > > > > > On Thu, Nov 6, 2014 at 9:14 PM, Jim Balhoff <ba...@ne...> > wrote: > > > I just realized my message may have been misleading. By "results are > the same", I mean that the problem is still apparent. When using SELECT > DISTINCT, ORDER BY does not work correctly and produces a different > ordering compared to SELECT. > > > > > > > > > > > > > > On Nov 6, 2014, at 12:22 PM, Jim Balhoff <ba...@ne...> > wrote: > > > > > > > > I updated the query to use the simple variable in ORDER BY, and the > results are the same. > > > > > > > > Here is the exact query (with or without DISTINCT) for the linked > results: > > > > > > > > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > > > > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > > > > PREFIX owl: <http://www.w3.org/2002/07/owl#> > > > > > > > > SELECT DISTINCT ?term ?string_label > > > > WHERE > > > > { > > > > ?term rdf:type owl:Class . > > > > ?term rdfs:label ?term_label . > > > > BIND (STR(?term_label) AS ?string_label) > > > > } > > > > ORDER BY ?string_label > > > > > > > > > > > > Results (same number of rows either way): > > > > SELECT DISTINCT: > > > > explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > > > > result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > > > > > > > > SELECT: > > > > explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > > > > result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > > > > > > > > Thanks, > > > > Jim > > > > > > > > > > > > > > > >> On Nov 6, 2014, at 12:01 PM, Bryan Thompson <br...@sy...> > wrote: > > > >> > > > >> What happens if you replace that last line with: > > > >> > > > >> ORDER BY ?string_label > > > >> > > > >> rather than > > > >> > > > >> ORDER BY STR(?string_label) > > > >> > > > >> Remember, it is assuming that the ORDER BY is using simple > variables. > > > >> > > > >> Bryan > > > >> > > > >> On Thu, Nov 6, 2014 at 11:58 AM, Jim Balhoff <ba...@ne...> > wrote: > > > >> Here is the exact query (with or without DISTINCT) for the linked > results: > > > >> > > > >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > > > >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > > > >> PREFIX owl: <http://www.w3.org/2002/07/owl#> > > > >> > > > >> SELECT DISTINCT ?term ?string_label > > > >> WHERE > > > >> { > > > >> ?term rdf:type owl:Class . > > > >> ?term rdfs:label ?term_label . > > > >> BIND (STR(?term_label) AS ?string_label) > > > >> } > > > >> ORDER BY STR(?string_label) > > > >> > > > >> > > > >> Results (same number of rows either way): > > > >> SELECT DISTINCT: > > > >> explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > > > >> result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > > > >> > > > >> SELECT: > > > >> explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > > > >> result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > > > >> > > > >> You can diff the two results files to see the out-of-order blocks. > > > >> > > > >> I suppose it does look like the DISTINCT query plan has ORDER BY > applied before DISTINCT, if I am reading it right. > > > >> > > > >> Thanks, > > > >> Jim > > > >> > > > >> > > > >> > > > >> > > > >>> On Nov 6, 2014, at 10:10 AM, Bryan Thompson <br...@sy...> > wrote: > > > >>> > > > >>> Jim, > > > >>> > > > >>> 502 is about support for expressions (other than simple variables > in ORDER_BY). > > > >>> > > > >>> If there is an issue with DISTINCT + ORDER_BY then this would be a > new ticket. > > > >>> > > > >>> Just post the EXPLAIN (attach to the email) for the moment. I > want to see how this is being generated. We should then check the > specification and make sure that the correct behavior is DISTINCT followed > by ORDER BY with any limit applied after the ORDER BY. I can then check > the code for how we are handling this. > > > >>> > > > >>> The relevant logic is in AST2BOpUtility at line 451. You can see > that it is already attempting to handle this and that there was a > historical ticket for this issue (#563). > > > >>> > > > >>> > > > >>> > > > >>> /* > > > >>> > > > >>> * Note: The DISTINCT operators also enforce the > projection. > > > >>> > > > >>> * > > > >>> > > > >>> * Note: REDUCED allows, but does not require, either > complete or > > > >>> > > > >>> * partial filtering of duplicates. It is part of what > openrdf does > > > >>> > > > >>> * for a DESCRIBE query. > > > >>> > > > >>> * > > > >>> > > > >>> * Note: We do not currently have special operator for > REDUCED. One > > > >>> > > > >>> * could be created using chunk wise DISTINCT. Note > that REDUCED may > > > >>> > > > >>> * not change the order in which the solutions appear > (but we are > > > >>> > > > >>> * evaluating it before ORDER BY so that is Ok.) > > > >>> > > > >>> * > > > >>> > > > >>> * TODO If there is an ORDER BY and a DISTINCT then the > sort can be > > > >>> > > > >>> * used to impose the distinct without the overhead of > a hash index > > > >>> > > > >>> * by filtering out the duplicate solutions after the > sort. > > > >>> > > > >>> */ > > > >>> > > > >>> > > > >>> > > > >>> // When true, DISTINCT must preserve ORDER BY ordering. > > > >>> > > > >>> final boolean preserveOrder; > > > >>> > > > >>> > > > >>> > > > >>> if (orderBy != null && !orderBy.isEmpty()) { > > > >>> > > > >>> > > > >>> > > > >>> /* > > > >>> > > > >>> * Note: ORDER BY before DISTINCT, so DISTINCT must > preserve > > > >>> > > > >>> * order. > > > >>> > > > >>> * > > > >>> > > > >>> * @see > https://sourceforge.net/apps/trac/bigdata/ticket/563 > > > >>> > > > >>> * (ORDER BY + DISTINCT) > > > >>> > > > >>> */ > > > >>> > > > >>> > > > >>> preserveOrder = true; > > > >>> > > > >>> > > > >>> > > > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > > > >>> > > > >>> > > > >>> > > > >>> } else { > > > >>> > > > >>> > > > >>> preserveOrder = false; > > > >>> > > > >>> > > > >>> } > > > >>> > > > >>> > > > >>> > > > >>> if (projection.isDistinct() || projection.isReduced()) { > > > >>> > > > >>> > > > >>> > > > >>> left = addDistinct(left, queryBase, preserveOrder, > ctx); > > > >>> > > > >>> > > > >>> > > > >>> } > > > >>> > > > >>> > > > >>> > > > >>> } else { > > > >>> > > > >>> > > > >>> > > > >>> /* > > > >>> > > > >>> * TODO Under what circumstances can the projection be > [null]? > > > >>> > > > >>> */ > > > >>> > > > >>> > > > >>> if (orderBy != null && !orderBy.isEmpty()) { > > > >>> > > > >>> > > > >>> > > > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > > > >>> > > > >>> > > > >>> > > > >>> } > > > >>> > > > >>> > > > >>> > > > >>> } > > > >>> > > > >>> > > > >>> > > > >>> Bryan > > > >>> > > > >>> > > > >>> ---- > > > >>> Bryan Thompson > > > >>> Chief Scientist & Founder > > > >>> SYSTAP, LLC > > > >>> 4501 Tower Road > > > >>> Greensboro, NC 27410 > > > >>> br...@sy... > > > >>> http://bigdata.com > > > >>> http://mapgraph.io > > > >>> CONFIDENTIALITY NOTICE: This email and its contents and > attachments are for the sole use of the intended recipient(s) and are > confidential or proprietary to SYSTAP. Any unauthorized review, use, > disclosure, dissemination or copying of this email or its contents or > attachments is prohibited. If you have received this communication in > error, please notify the sender by reply email and permanently delete all > copies of the email and its contents and attachments. > > > >>> > > > >>> > > > >>> > > > >>> On Thu, Nov 6, 2014 at 10:03 AM, Jim Balhoff <ba...@ne...> > wrote: > > > >>> Hi Bryan, > > > >>> > > > >>> Just to clarify, would you like me to attach the info to ticket > 502, or continue posting to the developer list? > > > >>> > > > >>> Thanks, > > > >>> Jim > > > >>> > > > >>> > > > >>>> On Nov 6, 2014, at 8:28 AM, Bryan Thompson <br...@sy...> > wrote: > > > >>>> > > > >>>> The ticket for allowing aggregates in ORDER BY is: > > > >>>> > > > >>>> - http://trac.bigdata.com/ticket/502 (Allow aggregates in ORDER > BY clause) > > > >>>> > > > >>>> Can you attach the EXPLAIN of the query with and without > DISTINCT. The issue may be that the DISTINCT is being applied after the > ORDER BY. I seem to remember some issue historically with operations being > performed before/after the ORDER BY, but I do not have any distinct > recollection of a problematic interaction between DISTINCT and ORDER BY. > > > >>>> > > > >>>> Bryan > > > >>>> > > > >>>> ---- > > > >>>> Bryan Thompson > > > >>>> Chief Scientist & Founder > > > >>>> SYSTAP, LLC > > > >>>> 4501 Tower Road > > > >>>> Greensboro, NC 27410 > > > >>>> br...@sy... > > > >>>> http://bigdata.com > > > >>>> http://mapgraph.io > > > >>>> CONFIDENTIALITY NOTICE: This email and its contents and > attachments are for the sole use of the intended recipient(s) and are > confidential or proprietary to SYSTAP. Any unauthorized review, use, > disclosure, dissemination or copying of this email or its contents or > attachments is prohibited. If you have received this communication in > error, please notify the sender by reply email and permanently delete all > copies of the email and its contents and attachments. > > > >>>> > > > >>>> > > > >>>> > > > >>>> On Wed, Nov 5, 2014 at 6:14 PM, Jim Balhoff <ba...@ne...> > wrote: > > > >>>>> On Nov 5, 2014, at 5:46 PM, Jeremy J Carroll <jj...@sy...> > wrote: > > > >>>>> > > > >>>>> > > > >>>>>> On Nov 5, 2014, at 1:02 PM, Bryan Thompson <br...@sy...> > wrote: > > > >>>>>> > > > >>>>>> There could be an issue with ORDER BY operating on an anonymous > and non-projected variable. Try declaring and binding a variable for > STR(?label) inside of the query and then using that variable in the ORDER > BY clause. > > > >>>>> > > > >>>>> > > > >>>>> Yes I tend to find the results of ORDER BY are more what I > expect if I do not include an expression in the ORDER BY but simply > variables. I BIND any expression before the ORDER BY. > > > >>>>> > > > >>>>> I believe there is a trac item for this, but since the > workaround is easy, I have never seen it as high priority > > > >>>>> > > > >>>> > > > >>>> As suggested I tried binding a variable as `BIND > (STR(?term_label) AS ?string_label)` and using that to sort. Still > incorrect ordering. But, I tried removing DISTINCT, and then the ordering > is correct. Even going back to the anonymous `ORDER BY STR(?term_label)`, > ordering is still correct if I remove DISTINCT. For this specific query > DISTINCT is not needed, but I do need it for my application. Is there a > reason to not expect DISTINCT to work correctly with ORDER BY? > > > >>>> > > > >>>> Thanks both of you for all of your help, > > > >>>> Jim > > > >>>> > > > >>>> > > > >>> > > > >>> > > > >> > > > >> > > > > > > > > > > > > > > > > > > > |
From: Jim B. <ba...@ne...> - 2014-11-07 20:19:06
|
Yes, I am getting the exact same ordering with and without DISTINCT. Thanks! Jim > On Nov 7, 2014, at 3:09 PM, Bryan Thompson <br...@sy...> wrote: > > Jim, > > Did that file fix the issue for you? > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://bigdata.com > http://mapgraph.io > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > On Fri, Nov 7, 2014 at 10:00 AM, Jim Balhoff <ba...@ne...> wrote: > I can give this a try—it will be good experience since I am not really familiar with the Bigdata source code. So it may take me a little while. > > Thanks, > Jim > > > > On Nov 7, 2014, at 8:44 AM, Bryan Thompson <br...@sy...> wrote: > > > > Jim, > > > > Can you put together a unit test for this so we can avoid regressions? It would need to have a sufficiently large data set to allow the problem to be demonstrated. You would need to run both queries and compare the resulting ordering. The data would have to be something that could be committed into SVN, so with appropriate data rights and not too large. But still large enough. > > > > Thanks, > > Bryan > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://bigdata.com > > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > > > > > On Fri, Nov 7, 2014 at 8:37 AM, Bryan Thompson <br...@sy...> wrote: > > Jim. > > > > Ok. I was able to pull together the output of both queries into a single worksheet and then compare the rows and mark the rows that were not EQUALS and as such had a different ordering. > > > > I have created a ticket for this. See http://trac.bigdata.com/ticket/1044. > > > > I would appreciate it if you could have gone a little further with this and reduced the problem to something that clearly highlighted the problem. I had to spend quite a bit of time trying to figure out why you were seeing a problem in the output data. I could not spot any problem myself until I put the data sets side-by-side in Excel and even then I had to automate the comparison and then FILTER (in Excel) to find the rows where the output differed. > > > > I think that I know the root cause. I will update the ticket shortly and attach a file that you can test on your end for a fix. > > > > Thanks, > > Bryan > > > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://bigdata.com > > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > > > > > On Thu, Nov 6, 2014 at 9:14 PM, Jim Balhoff <ba...@ne...> wrote: > > I just realized my message may have been misleading. By "results are the same", I mean that the problem is still apparent. When using SELECT DISTINCT, ORDER BY does not work correctly and produces a different ordering compared to SELECT. > > > > > > > > > > On Nov 6, 2014, at 12:22 PM, Jim Balhoff <ba...@ne...> wrote: > > > > > > I updated the query to use the simple variable in ORDER BY, and the results are the same. > > > > > > Here is the exact query (with or without DISTINCT) for the linked results: > > > > > > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > > > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > > > PREFIX owl: <http://www.w3.org/2002/07/owl#> > > > > > > SELECT DISTINCT ?term ?string_label > > > WHERE > > > { > > > ?term rdf:type owl:Class . > > > ?term rdfs:label ?term_label . > > > BIND (STR(?term_label) AS ?string_label) > > > } > > > ORDER BY ?string_label > > > > > > > > > Results (same number of rows either way): > > > SELECT DISTINCT: > > > explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > > > result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > > > > > > SELECT: > > > explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > > > result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > > > > > > Thanks, > > > Jim > > > > > > > > > > > >> On Nov 6, 2014, at 12:01 PM, Bryan Thompson <br...@sy...> wrote: > > >> > > >> What happens if you replace that last line with: > > >> > > >> ORDER BY ?string_label > > >> > > >> rather than > > >> > > >> ORDER BY STR(?string_label) > > >> > > >> Remember, it is assuming that the ORDER BY is using simple variables. > > >> > > >> Bryan > > >> > > >> On Thu, Nov 6, 2014 at 11:58 AM, Jim Balhoff <ba...@ne...> wrote: > > >> Here is the exact query (with or without DISTINCT) for the linked results: > > >> > > >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > > >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > > >> PREFIX owl: <http://www.w3.org/2002/07/owl#> > > >> > > >> SELECT DISTINCT ?term ?string_label > > >> WHERE > > >> { > > >> ?term rdf:type owl:Class . > > >> ?term rdfs:label ?term_label . > > >> BIND (STR(?term_label) AS ?string_label) > > >> } > > >> ORDER BY STR(?string_label) > > >> > > >> > > >> Results (same number of rows either way): > > >> SELECT DISTINCT: > > >> explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > > >> result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > > >> > > >> SELECT: > > >> explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > > >> result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > > >> > > >> You can diff the two results files to see the out-of-order blocks. > > >> > > >> I suppose it does look like the DISTINCT query plan has ORDER BY applied before DISTINCT, if I am reading it right. > > >> > > >> Thanks, > > >> Jim > > >> > > >> > > >> > > >> > > >>> On Nov 6, 2014, at 10:10 AM, Bryan Thompson <br...@sy...> wrote: > > >>> > > >>> Jim, > > >>> > > >>> 502 is about support for expressions (other than simple variables in ORDER_BY). > > >>> > > >>> If there is an issue with DISTINCT + ORDER_BY then this would be a new ticket. > > >>> > > >>> Just post the EXPLAIN (attach to the email) for the moment. I want to see how this is being generated. We should then check the specification and make sure that the correct behavior is DISTINCT followed by ORDER BY with any limit applied after the ORDER BY. I can then check the code for how we are handling this. > > >>> > > >>> The relevant logic is in AST2BOpUtility at line 451. You can see that it is already attempting to handle this and that there was a historical ticket for this issue (#563). > > >>> > > >>> > > >>> > > >>> /* > > >>> > > >>> * Note: The DISTINCT operators also enforce the projection. > > >>> > > >>> * > > >>> > > >>> * Note: REDUCED allows, but does not require, either complete or > > >>> > > >>> * partial filtering of duplicates. It is part of what openrdf does > > >>> > > >>> * for a DESCRIBE query. > > >>> > > >>> * > > >>> > > >>> * Note: We do not currently have special operator for REDUCED. One > > >>> > > >>> * could be created using chunk wise DISTINCT. Note that REDUCED may > > >>> > > >>> * not change the order in which the solutions appear (but we are > > >>> > > >>> * evaluating it before ORDER BY so that is Ok.) > > >>> > > >>> * > > >>> > > >>> * TODO If there is an ORDER BY and a DISTINCT then the sort can be > > >>> > > >>> * used to impose the distinct without the overhead of a hash index > > >>> > > >>> * by filtering out the duplicate solutions after the sort. > > >>> > > >>> */ > > >>> > > >>> > > >>> > > >>> // When true, DISTINCT must preserve ORDER BY ordering. > > >>> > > >>> final boolean preserveOrder; > > >>> > > >>> > > >>> > > >>> if (orderBy != null && !orderBy.isEmpty()) { > > >>> > > >>> > > >>> > > >>> /* > > >>> > > >>> * Note: ORDER BY before DISTINCT, so DISTINCT must preserve > > >>> > > >>> * order. > > >>> > > >>> * > > >>> > > >>> * @see https://sourceforge.net/apps/trac/bigdata/ticket/563 > > >>> > > >>> * (ORDER BY + DISTINCT) > > >>> > > >>> */ > > >>> > > >>> > > >>> preserveOrder = true; > > >>> > > >>> > > >>> > > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > > >>> > > >>> > > >>> > > >>> } else { > > >>> > > >>> > > >>> preserveOrder = false; > > >>> > > >>> > > >>> } > > >>> > > >>> > > >>> > > >>> if (projection.isDistinct() || projection.isReduced()) { > > >>> > > >>> > > >>> > > >>> left = addDistinct(left, queryBase, preserveOrder, ctx); > > >>> > > >>> > > >>> > > >>> } > > >>> > > >>> > > >>> > > >>> } else { > > >>> > > >>> > > >>> > > >>> /* > > >>> > > >>> * TODO Under what circumstances can the projection be [null]? > > >>> > > >>> */ > > >>> > > >>> > > >>> if (orderBy != null && !orderBy.isEmpty()) { > > >>> > > >>> > > >>> > > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > > >>> > > >>> > > >>> > > >>> } > > >>> > > >>> > > >>> > > >>> } > > >>> > > >>> > > >>> > > >>> Bryan > > >>> > > >>> > > >>> ---- > > >>> Bryan Thompson > > >>> Chief Scientist & Founder > > >>> SYSTAP, LLC > > >>> 4501 Tower Road > > >>> Greensboro, NC 27410 > > >>> br...@sy... > > >>> http://bigdata.com > > >>> http://mapgraph.io > > >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > >>> > > >>> > > >>> > > >>> On Thu, Nov 6, 2014 at 10:03 AM, Jim Balhoff <ba...@ne...> wrote: > > >>> Hi Bryan, > > >>> > > >>> Just to clarify, would you like me to attach the info to ticket 502, or continue posting to the developer list? > > >>> > > >>> Thanks, > > >>> Jim > > >>> > > >>> > > >>>> On Nov 6, 2014, at 8:28 AM, Bryan Thompson <br...@sy...> wrote: > > >>>> > > >>>> The ticket for allowing aggregates in ORDER BY is: > > >>>> > > >>>> - http://trac.bigdata.com/ticket/502 (Allow aggregates in ORDER BY clause) > > >>>> > > >>>> Can you attach the EXPLAIN of the query with and without DISTINCT. The issue may be that the DISTINCT is being applied after the ORDER BY. I seem to remember some issue historically with operations being performed before/after the ORDER BY, but I do not have any distinct recollection of a problematic interaction between DISTINCT and ORDER BY. > > >>>> > > >>>> Bryan > > >>>> > > >>>> ---- > > >>>> Bryan Thompson > > >>>> Chief Scientist & Founder > > >>>> SYSTAP, LLC > > >>>> 4501 Tower Road > > >>>> Greensboro, NC 27410 > > >>>> br...@sy... > > >>>> http://bigdata.com > > >>>> http://mapgraph.io > > >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > >>>> > > >>>> > > >>>> > > >>>> On Wed, Nov 5, 2014 at 6:14 PM, Jim Balhoff <ba...@ne...> wrote: > > >>>>> On Nov 5, 2014, at 5:46 PM, Jeremy J Carroll <jj...@sy...> wrote: > > >>>>> > > >>>>> > > >>>>>> On Nov 5, 2014, at 1:02 PM, Bryan Thompson <br...@sy...> wrote: > > >>>>>> > > >>>>>> There could be an issue with ORDER BY operating on an anonymous and non-projected variable. Try declaring and binding a variable for STR(?label) inside of the query and then using that variable in the ORDER BY clause. > > >>>>> > > >>>>> > > >>>>> Yes I tend to find the results of ORDER BY are more what I expect if I do not include an expression in the ORDER BY but simply variables. I BIND any expression before the ORDER BY. > > >>>>> > > >>>>> I believe there is a trac item for this, but since the workaround is easy, I have never seen it as high priority > > >>>>> > > >>>> > > >>>> As suggested I tried binding a variable as `BIND (STR(?term_label) AS ?string_label)` and using that to sort. Still incorrect ordering. But, I tried removing DISTINCT, and then the ordering is correct. Even going back to the anonymous `ORDER BY STR(?term_label)`, ordering is still correct if I remove DISTINCT. For this specific query DISTINCT is not needed, but I do need it for my application. Is there a reason to not expect DISTINCT to work correctly with ORDER BY? > > >>>> > > >>>> Thanks both of you for all of your help, > > >>>> Jim > > >>>> > > >>>> > > >>> > > >>> > > >> > > >> > > > > > > > > > > > |
From: Bryan T. <br...@sy...> - 2014-11-07 20:09:59
|
Jim, Did that file fix the issue for you? Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Nov 7, 2014 at 10:00 AM, Jim Balhoff <ba...@ne...> wrote: > I can give this a try—it will be good experience since I am not really > familiar with the Bigdata source code. So it may take me a little while. > > Thanks, > Jim > > > > On Nov 7, 2014, at 8:44 AM, Bryan Thompson <br...@sy...> wrote: > > > > Jim, > > > > Can you put together a unit test for this so we can avoid regressions? > It would need to have a sufficiently large data set to allow the problem to > be demonstrated. You would need to run both queries and compare the > resulting ordering. The data would have to be something that could be > committed into SVN, so with appropriate data rights and not too large. But > still large enough. > > > > Thanks, > > Bryan > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://bigdata.com > > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > > > > > > On Fri, Nov 7, 2014 at 8:37 AM, Bryan Thompson <br...@sy...> wrote: > > Jim. > > > > Ok. I was able to pull together the output of both queries into a > single worksheet and then compare the rows and mark the rows that were not > EQUALS and as such had a different ordering. > > > > I have created a ticket for this. See > http://trac.bigdata.com/ticket/1044. > > > > I would appreciate it if you could have gone a little further with this > and reduced the problem to something that clearly highlighted the problem. > I had to spend quite a bit of time trying to figure out why you were seeing > a problem in the output data. I could not spot any problem myself until I > put the data sets side-by-side in Excel and even then I had to automate the > comparison and then FILTER (in Excel) to find the rows where the output > differed. > > > > I think that I know the root cause. I will update the ticket shortly > and attach a file that you can test on your end for a fix. > > > > Thanks, > > Bryan > > > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://bigdata.com > > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > > > > > > On Thu, Nov 6, 2014 at 9:14 PM, Jim Balhoff <ba...@ne...> wrote: > > I just realized my message may have been misleading. By "results are the > same", I mean that the problem is still apparent. When using SELECT > DISTINCT, ORDER BY does not work correctly and produces a different > ordering compared to SELECT. > > > > > > > > > > On Nov 6, 2014, at 12:22 PM, Jim Balhoff <ba...@ne...> wrote: > > > > > > I updated the query to use the simple variable in ORDER BY, and the > results are the same. > > > > > > Here is the exact query (with or without DISTINCT) for the linked > results: > > > > > > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > > > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > > > PREFIX owl: <http://www.w3.org/2002/07/owl#> > > > > > > SELECT DISTINCT ?term ?string_label > > > WHERE > > > { > > > ?term rdf:type owl:Class . > > > ?term rdfs:label ?term_label . > > > BIND (STR(?term_label) AS ?string_label) > > > } > > > ORDER BY ?string_label > > > > > > > > > Results (same number of rows either way): > > > SELECT DISTINCT: > > > explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > > > result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > > > > > > SELECT: > > > explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > > > result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > > > > > > Thanks, > > > Jim > > > > > > > > > > > >> On Nov 6, 2014, at 12:01 PM, Bryan Thompson <br...@sy...> wrote: > > >> > > >> What happens if you replace that last line with: > > >> > > >> ORDER BY ?string_label > > >> > > >> rather than > > >> > > >> ORDER BY STR(?string_label) > > >> > > >> Remember, it is assuming that the ORDER BY is using simple variables. > > >> > > >> Bryan > > >> > > >> On Thu, Nov 6, 2014 at 11:58 AM, Jim Balhoff <ba...@ne...> > wrote: > > >> Here is the exact query (with or without DISTINCT) for the linked > results: > > >> > > >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > > >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > > >> PREFIX owl: <http://www.w3.org/2002/07/owl#> > > >> > > >> SELECT DISTINCT ?term ?string_label > > >> WHERE > > >> { > > >> ?term rdf:type owl:Class . > > >> ?term rdfs:label ?term_label . > > >> BIND (STR(?term_label) AS ?string_label) > > >> } > > >> ORDER BY STR(?string_label) > > >> > > >> > > >> Results (same number of rows either way): > > >> SELECT DISTINCT: > > >> explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > > >> result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > > >> > > >> SELECT: > > >> explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > > >> result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > > >> > > >> You can diff the two results files to see the out-of-order blocks. > > >> > > >> I suppose it does look like the DISTINCT query plan has ORDER BY > applied before DISTINCT, if I am reading it right. > > >> > > >> Thanks, > > >> Jim > > >> > > >> > > >> > > >> > > >>> On Nov 6, 2014, at 10:10 AM, Bryan Thompson <br...@sy...> > wrote: > > >>> > > >>> Jim, > > >>> > > >>> 502 is about support for expressions (other than simple variables in > ORDER_BY). > > >>> > > >>> If there is an issue with DISTINCT + ORDER_BY then this would be a > new ticket. > > >>> > > >>> Just post the EXPLAIN (attach to the email) for the moment. I want > to see how this is being generated. We should then check the specification > and make sure that the correct behavior is DISTINCT followed by ORDER BY > with any limit applied after the ORDER BY. I can then check the code for > how we are handling this. > > >>> > > >>> The relevant logic is in AST2BOpUtility at line 451. You can see > that it is already attempting to handle this and that there was a > historical ticket for this issue (#563). > > >>> > > >>> > > >>> > > >>> /* > > >>> > > >>> * Note: The DISTINCT operators also enforce the > projection. > > >>> > > >>> * > > >>> > > >>> * Note: REDUCED allows, but does not require, either > complete or > > >>> > > >>> * partial filtering of duplicates. It is part of what > openrdf does > > >>> > > >>> * for a DESCRIBE query. > > >>> > > >>> * > > >>> > > >>> * Note: We do not currently have special operator for > REDUCED. One > > >>> > > >>> * could be created using chunk wise DISTINCT. Note that > REDUCED may > > >>> > > >>> * not change the order in which the solutions appear > (but we are > > >>> > > >>> * evaluating it before ORDER BY so that is Ok.) > > >>> > > >>> * > > >>> > > >>> * TODO If there is an ORDER BY and a DISTINCT then the > sort can be > > >>> > > >>> * used to impose the distinct without the overhead of a > hash index > > >>> > > >>> * by filtering out the duplicate solutions after the > sort. > > >>> > > >>> */ > > >>> > > >>> > > >>> > > >>> // When true, DISTINCT must preserve ORDER BY ordering. > > >>> > > >>> final boolean preserveOrder; > > >>> > > >>> > > >>> > > >>> if (orderBy != null && !orderBy.isEmpty()) { > > >>> > > >>> > > >>> > > >>> /* > > >>> > > >>> * Note: ORDER BY before DISTINCT, so DISTINCT must > preserve > > >>> > > >>> * order. > > >>> > > >>> * > > >>> > > >>> * @see > https://sourceforge.net/apps/trac/bigdata/ticket/563 > > >>> > > >>> * (ORDER BY + DISTINCT) > > >>> > > >>> */ > > >>> > > >>> > > >>> preserveOrder = true; > > >>> > > >>> > > >>> > > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > > >>> > > >>> > > >>> > > >>> } else { > > >>> > > >>> > > >>> preserveOrder = false; > > >>> > > >>> > > >>> } > > >>> > > >>> > > >>> > > >>> if (projection.isDistinct() || projection.isReduced()) { > > >>> > > >>> > > >>> > > >>> left = addDistinct(left, queryBase, preserveOrder, > ctx); > > >>> > > >>> > > >>> > > >>> } > > >>> > > >>> > > >>> > > >>> } else { > > >>> > > >>> > > >>> > > >>> /* > > >>> > > >>> * TODO Under what circumstances can the projection be > [null]? > > >>> > > >>> */ > > >>> > > >>> > > >>> if (orderBy != null && !orderBy.isEmpty()) { > > >>> > > >>> > > >>> > > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > > >>> > > >>> > > >>> > > >>> } > > >>> > > >>> > > >>> > > >>> } > > >>> > > >>> > > >>> > > >>> Bryan > > >>> > > >>> > > >>> ---- > > >>> Bryan Thompson > > >>> Chief Scientist & Founder > > >>> SYSTAP, LLC > > >>> 4501 Tower Road > > >>> Greensboro, NC 27410 > > >>> br...@sy... > > >>> http://bigdata.com > > >>> http://mapgraph.io > > >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments > are for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > >>> > > >>> > > >>> > > >>> On Thu, Nov 6, 2014 at 10:03 AM, Jim Balhoff <ba...@ne...> > wrote: > > >>> Hi Bryan, > > >>> > > >>> Just to clarify, would you like me to attach the info to ticket 502, > or continue posting to the developer list? > > >>> > > >>> Thanks, > > >>> Jim > > >>> > > >>> > > >>>> On Nov 6, 2014, at 8:28 AM, Bryan Thompson <br...@sy...> > wrote: > > >>>> > > >>>> The ticket for allowing aggregates in ORDER BY is: > > >>>> > > >>>> - http://trac.bigdata.com/ticket/502 (Allow aggregates in ORDER BY > clause) > > >>>> > > >>>> Can you attach the EXPLAIN of the query with and without DISTINCT. > The issue may be that the DISTINCT is being applied after the ORDER BY. I > seem to remember some issue historically with operations being performed > before/after the ORDER BY, but I do not have any distinct recollection of a > problematic interaction between DISTINCT and ORDER BY. > > >>>> > > >>>> Bryan > > >>>> > > >>>> ---- > > >>>> Bryan Thompson > > >>>> Chief Scientist & Founder > > >>>> SYSTAP, LLC > > >>>> 4501 Tower Road > > >>>> Greensboro, NC 27410 > > >>>> br...@sy... > > >>>> http://bigdata.com > > >>>> http://mapgraph.io > > >>>> CONFIDENTIALITY NOTICE: This email and its contents and > attachments are for the sole use of the intended recipient(s) and are > confidential or proprietary to SYSTAP. Any unauthorized review, use, > disclosure, dissemination or copying of this email or its contents or > attachments is prohibited. If you have received this communication in > error, please notify the sender by reply email and permanently delete all > copies of the email and its contents and attachments. > > >>>> > > >>>> > > >>>> > > >>>> On Wed, Nov 5, 2014 at 6:14 PM, Jim Balhoff <ba...@ne...> > wrote: > > >>>>> On Nov 5, 2014, at 5:46 PM, Jeremy J Carroll <jj...@sy...> > wrote: > > >>>>> > > >>>>> > > >>>>>> On Nov 5, 2014, at 1:02 PM, Bryan Thompson <br...@sy...> > wrote: > > >>>>>> > > >>>>>> There could be an issue with ORDER BY operating on an anonymous > and non-projected variable. Try declaring and binding a variable for > STR(?label) inside of the query and then using that variable in the ORDER > BY clause. > > >>>>> > > >>>>> > > >>>>> Yes I tend to find the results of ORDER BY are more what I expect > if I do not include an expression in the ORDER BY but simply variables. I > BIND any expression before the ORDER BY. > > >>>>> > > >>>>> I believe there is a trac item for this, but since the workaround > is easy, I have never seen it as high priority > > >>>>> > > >>>> > > >>>> As suggested I tried binding a variable as `BIND (STR(?term_label) > AS ?string_label)` and using that to sort. Still incorrect ordering. But, I > tried removing DISTINCT, and then the ordering is correct. Even going back > to the anonymous `ORDER BY STR(?term_label)`, ordering is still correct if > I remove DISTINCT. For this specific query DISTINCT is not needed, but I do > need it for my application. Is there a reason to not expect DISTINCT to > work correctly with ORDER BY? > > >>>> > > >>>> Thanks both of you for all of your help, > > >>>> Jim > > >>>> > > >>>> > > >>> > > >>> > > >> > > >> > > > > > > > > > > > |
From: Bryan T. <br...@sy...> - 2014-11-07 20:05:04
|
There is an ExportKB utility. It is described on the follow wiki page: - DataMigration <http://wiki.bigdata.com/wiki/index.php/DataMigration> (A page dedicated to data migration.) See http://wiki.bigdata.com/wiki/index.php/DataMigration#Export The HAJournal supports online backup. This provides a compressed copy of the journal that is consistent with the state of the journal as of the last commit point when the backup was requested. This can be used in the HA1 mode (without replication) to have online backups and incremental transaction logs. The incremental transaction logs are the write set for each commit point together with the opening and closing root block. The HARestore utility may be used to decompress a given journal file and apply the incremental logs in order to roll forward to a given commit point. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Nov 7, 2014 at 12:44 PM, Jeremy J Carroll <jj...@sy...> wrote: > > If we want a backup copy of the store, I can use various sparql construct > calls to extract each named graph, but is there a simpler dump-to-trig > option somehow? > > Jeremy > > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Jeremy J C. <jj...@sy...> - 2014-11-07 17:52:55
|
If we want a backup copy of the store, I can use various sparql construct calls to extract each named graph, but is there a simpler dump-to-trig option somehow? Jeremy |
From: Bryan T. <br...@sy...> - 2014-11-07 15:08:03
|
I just re-attached the file to the ticket. I had posted the wrong version. Also see my email in response to Jeremy which has the (correct) file attached. Yes, it was a bit tricky in the end. Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Nov 7, 2014 at 9:58 AM, Jim Balhoff <ba...@ne...> wrote: > Hi Bryan, > > > On Nov 7, 2014, at 8:37 AM, Bryan Thompson <br...@sy...> wrote: > > > > Jim. > > > > Ok. I was able to pull together the output of both queries into a > single worksheet and then compare the rows and mark the rows that were not > EQUALS and as such had a different ordering. > > > > I have created a ticket for this. See > http://trac.bigdata.com/ticket/1044. > > Great, thank you. > > > > I would appreciate it if you could have gone a little further with this > and reduced the problem to something that clearly highlighted the problem. > I had to spend quite a bit of time trying to figure out why you were seeing > a problem in the output data. I could not spot any problem myself until I > put the data sets side-by-side in Excel and even then I had to automate the > comparison and then FILTER (in Excel) to find the rows where the output > differed. > > I was a little confused at first about what in the query was causing the > problem. In the end perhaps I should have provided the diff output along > with the two results files. > > > Best regards, > Jim > > > > I think that I know the root cause. I will update the ticket shortly > and attach a file that you can test on your end for a fix. > > > > Thanks, > > Bryan > > > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://bigdata.com > > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > > > > |
From: Bryan T. <br...@sy...> - 2014-11-07 15:07:15
|
I think we can fix this easily enough. I had botched the file upload. It is now (finally) correctly attached to the ticket. It is also attached to this email. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Nov 7, 2014 at 9:46 AM, Jeremy J Carroll <jj...@sy...> wrote: > That’s very subtle: one way of approaching fixing this is as an > ‘optimizer’ which rewrites the query to put the distinct in a subselect and > the order by at the outer level > In general the standard does not require the order by’s to be preserved > from subselects, so this particular case looks like a special > > > Jeremy > > > > On Nov 7, 2014, at 5:44 AM, Bryan Thompson <br...@sy...> wrote: > > Jim, > > Can you put together a unit test for this so we can avoid regressions? It > would need to have a sufficiently large data set to allow the problem to be > demonstrated. You would need to run both queries and compare the resulting > ordering. The data would have to be something that could be committed into > SVN, so with appropriate data rights and not too large. But still large > enough. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://bigdata.com > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Fri, Nov 7, 2014 at 8:37 AM, Bryan Thompson <br...@sy...> wrote: > >> Jim. >> >> Ok. I was able to pull together the output of both queries into a single >> worksheet and then compare the rows and mark the rows that were not EQUALS >> and as such had a different ordering. >> >> I have created a ticket for this. See >> http://trac.bigdata.com/ticket/1044. >> >> I would appreciate it if you could have gone a little further with this >> and reduced the problem to something that clearly highlighted the problem. >> I had to spend quite a bit of time trying to figure out why you were seeing >> a problem in the output data. I could not spot any problem myself until I >> put the data sets side-by-side in Excel and even then I had to automate the >> comparison and then FILTER (in Excel) to find the rows where the output >> differed. >> >> I think that I know the root cause. I will update the ticket shortly and >> attach a file that you can test on your end for a fix. >> >> Thanks, >> Bryan >> >> >> ---- >> Bryan Thompson >> Chief Scientist & Founder >> SYSTAP, LLC >> 4501 Tower Road >> Greensboro, NC 27410 >> br...@sy... >> http://bigdata.com >> http://mapgraph.io >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. >> >> On Thu, Nov 6, 2014 at 9:14 PM, Jim Balhoff <ba...@ne...> wrote: >> >>> I just realized my message may have been misleading. By "results are the >>> same", I mean that the problem is still apparent. When using SELECT >>> DISTINCT, ORDER BY does not work correctly and produces a different >>> ordering compared to SELECT. >>> >>> >>> > >>> > On Nov 6, 2014, at 12:22 PM, Jim Balhoff <ba...@ne...> wrote: >>> > >>> > I updated the query to use the simple variable in ORDER BY, and the >>> results are the same. >>> > >>> > Here is the exact query (with or without DISTINCT) for the linked >>> results: >>> > >>> > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>> > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> >>> > PREFIX owl: <http://www.w3.org/2002/07/owl#> >>> > >>> > SELECT DISTINCT ?term ?string_label >>> > WHERE >>> > { >>> > ?term rdf:type owl:Class . >>> > ?term rdfs:label ?term_label . >>> > BIND (STR(?term_label) AS ?string_label) >>> > } >>> > ORDER BY ?string_label >>> > >>> > >>> > Results (same number of rows either way): >>> > SELECT DISTINCT: >>> > explain: >>> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html >>> > result: >>> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv >>> > >>> > SELECT: >>> > explain: >>> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html >>> > result: >>> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv >>> > >>> > Thanks, >>> > Jim >>> > >>> > >>> > >>> >> On Nov 6, 2014, at 12:01 PM, Bryan Thompson <br...@sy...> wrote: >>> >> >>> >> What happens if you replace that last line with: >>> >> >>> >> ORDER BY ?string_label >>> >> >>> >> rather than >>> >> >>> >> ORDER BY STR(?string_label) >>> >> >>> >> Remember, it is assuming that the ORDER BY is using simple variables. >>> >> >>> >> Bryan >>> >> >>> >> On Thu, Nov 6, 2014 at 11:58 AM, Jim Balhoff <ba...@ne...> >>> wrote: >>> >> Here is the exact query (with or without DISTINCT) for the linked >>> results: >>> >> >>> >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>> >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> >>> >> PREFIX owl: <http://www.w3.org/2002/07/owl#> >>> >> >>> >> SELECT DISTINCT ?term ?string_label >>> >> WHERE >>> >> { >>> >> ?term rdf:type owl:Class . >>> >> ?term rdfs:label ?term_label . >>> >> BIND (STR(?term_label) AS ?string_label) >>> >> } >>> >> ORDER BY STR(?string_label) >>> >> >>> >> >>> >> Results (same number of rows either way): >>> >> SELECT DISTINCT: >>> >> explain: >>> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html >>> >> result: >>> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv >>> >> >>> >> SELECT: >>> >> explain: >>> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html >>> >> result: >>> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv >>> >> >>> >> You can diff the two results files to see the out-of-order blocks. >>> >> >>> >> I suppose it does look like the DISTINCT query plan has ORDER BY >>> applied before DISTINCT, if I am reading it right. >>> >> >>> >> Thanks, >>> >> Jim >>> >> >>> >> >>> >> >>> >> >>> >>> On Nov 6, 2014, at 10:10 AM, Bryan Thompson <br...@sy...> >>> wrote: >>> >>> >>> >>> Jim, >>> >>> >>> >>> 502 is about support for expressions (other than simple variables in >>> ORDER_BY). >>> >>> >>> >>> If there is an issue with DISTINCT + ORDER_BY then this would be a >>> new ticket. >>> >>> >>> >>> Just post the EXPLAIN (attach to the email) for the moment. I want >>> to see how this is being generated. We should then check the specification >>> and make sure that the correct behavior is DISTINCT followed by ORDER BY >>> with any limit applied after the ORDER BY. I can then check the code for >>> how we are handling this. >>> >>> >>> >>> The relevant logic is in AST2BOpUtility at line 451. You can see >>> that it is already attempting to handle this and that there was a >>> historical ticket for this issue (#563). >>> >>> >>> >>> >>> >>> >>> >>> /* >>> >>> >>> >>> * Note: The DISTINCT operators also enforce the >>> projection. >>> >>> >>> >>> * >>> >>> >>> >>> * Note: REDUCED allows, but does not require, either >>> complete or >>> >>> >>> >>> * partial filtering of duplicates. It is part of what >>> openrdf does >>> >>> >>> >>> * for a DESCRIBE query. >>> >>> >>> >>> * >>> >>> >>> >>> * Note: We do not currently have special operator for >>> REDUCED. One >>> >>> >>> >>> * could be created using chunk wise DISTINCT. Note that >>> REDUCED may >>> >>> >>> >>> * not change the order in which the solutions appear >>> (but we are >>> >>> >>> >>> * evaluating it before ORDER BY so that is Ok.) >>> >>> >>> >>> * >>> >>> >>> >>> * TODO If there is an ORDER BY and a DISTINCT then the >>> sort can be >>> >>> >>> >>> * used to impose the distinct without the overhead of a >>> hash index >>> >>> >>> >>> * by filtering out the duplicate solutions after the >>> sort. >>> >>> >>> >>> */ >>> >>> >>> >>> >>> >>> >>> >>> // When true, DISTINCT must preserve ORDER BY ordering. >>> >>> >>> >>> final boolean preserveOrder; >>> >>> >>> >>> >>> >>> >>> >>> if (orderBy != null && !orderBy.isEmpty()) { >>> >>> >>> >>> >>> >>> >>> >>> /* >>> >>> >>> >>> * Note: ORDER BY before DISTINCT, so DISTINCT must >>> preserve >>> >>> >>> >>> * order. >>> >>> >>> >>> * >>> >>> >>> >>> * @see >>> https://sourceforge.net/apps/trac/bigdata/ticket/563 >>> >>> >>> >>> * (ORDER BY + DISTINCT) >>> >>> >>> >>> */ >>> >>> >>> >>> >>> >>> preserveOrder = true; >>> >>> >>> >>> >>> >>> >>> >>> left = addOrderBy(left, queryBase, orderBy, ctx); >>> >>> >>> >>> >>> >>> >>> >>> } else { >>> >>> >>> >>> >>> >>> preserveOrder = false; >>> >>> >>> >>> >>> >>> } >>> >>> >>> >>> >>> >>> >>> >>> if (projection.isDistinct() || projection.isReduced()) { >>> >>> >>> >>> >>> >>> >>> >>> left = addDistinct(left, queryBase, preserveOrder, >>> ctx); >>> >>> >>> >>> >>> >>> >>> >>> } >>> >>> >>> >>> >>> >>> >>> >>> } else { >>> >>> >>> >>> >>> >>> >>> >>> /* >>> >>> >>> >>> * TODO Under what circumstances can the projection be >>> [null]? >>> >>> >>> >>> */ >>> >>> >>> >>> >>> >>> if (orderBy != null && !orderBy.isEmpty()) { >>> >>> >>> >>> >>> >>> >>> >>> left = addOrderBy(left, queryBase, orderBy, ctx); >>> >>> >>> >>> >>> >>> >>> >>> } >>> >>> >>> >>> >>> >>> >>> >>> } >>> >>> >>> >>> >>> >>> >>> >>> Bryan >>> >>> >>> >>> >>> >>> ---- >>> >>> Bryan Thompson >>> >>> Chief Scientist & Founder >>> >>> SYSTAP, LLC >>> >>> 4501 Tower Road >>> >>> Greensboro, NC 27410 >>> >>> br...@sy... >>> >>> http://bigdata.com >>> >>> http://mapgraph.io >>> >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments >>> are for the sole use of the intended recipient(s) and are confidential or >>> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >>> dissemination or copying of this email or its contents or attachments is >>> prohibited. If you have received this communication in error, please notify >>> the sender by reply email and permanently delete all copies of the email >>> and its contents and attachments. >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Nov 6, 2014 at 10:03 AM, Jim Balhoff <ba...@ne...> >>> wrote: >>> >>> Hi Bryan, >>> >>> >>> >>> Just to clarify, would you like me to attach the info to ticket 502, >>> or continue posting to the developer list? >>> >>> >>> >>> Thanks, >>> >>> Jim >>> >>> >>> >>> >>> >>>> On Nov 6, 2014, at 8:28 AM, Bryan Thompson <br...@sy...> >>> wrote: >>> >>>> >>> >>>> The ticket for allowing aggregates in ORDER BY is: >>> >>>> >>> >>>> - http://trac.bigdata.com/ticket/502 (Allow aggregates in ORDER BY >>> clause) >>> >>>> >>> >>>> Can you attach the EXPLAIN of the query with and without DISTINCT. >>> The issue may be that the DISTINCT is being applied after the ORDER BY. I >>> seem to remember some issue historically with operations being performed >>> before/after the ORDER BY, but I do not have any distinct recollection of a >>> problematic interaction between DISTINCT and ORDER BY. >>> >>>> >>> >>>> Bryan >>> >>>> >>> >>>> ---- >>> >>>> Bryan Thompson >>> >>>> Chief Scientist & Founder >>> >>>> SYSTAP, LLC >>> >>>> 4501 Tower Road >>> >>>> Greensboro, NC 27410 >>> >>>> br...@sy... >>> >>>> http://bigdata.com >>> >>>> http://mapgraph.io >>> >>>> CONFIDENTIALITY NOTICE: This email and its contents and >>> attachments are for the sole use of the intended recipient(s) and are >>> confidential or proprietary to SYSTAP. Any unauthorized review, use, >>> disclosure, dissemination or copying of this email or its contents or >>> attachments is prohibited. If you have received this communication in >>> error, please notify the sender by reply email and permanently delete all >>> copies of the email and its contents and attachments. >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Wed, Nov 5, 2014 at 6:14 PM, Jim Balhoff <ba...@ne...> >>> wrote: >>> >>>>> On Nov 5, 2014, at 5:46 PM, Jeremy J Carroll <jj...@sy...> >>> wrote: >>> >>>>> >>> >>>>> >>> >>>>>> On Nov 5, 2014, at 1:02 PM, Bryan Thompson <br...@sy...> >>> wrote: >>> >>>>>> >>> >>>>>> There could be an issue with ORDER BY operating on an anonymous >>> and non-projected variable. Try declaring and binding a variable for >>> STR(?label) inside of the query and then using that variable in the ORDER >>> BY clause. >>> >>>>> >>> >>>>> >>> >>>>> Yes I tend to find the results of ORDER BY are more what I expect >>> if I do not include an expression in the ORDER BY but simply variables. I >>> BIND any expression before the ORDER BY. >>> >>>>> >>> >>>>> I believe there is a trac item for this, but since the workaround >>> is easy, I have never seen it as high priority >>> >>>>> >>> >>>> >>> >>>> As suggested I tried binding a variable as `BIND (STR(?term_label) >>> AS ?string_label)` and using that to sort. Still incorrect ordering. But, I >>> tried removing DISTINCT, and then the ordering is correct. Even going back >>> to the anonymous `ORDER BY STR(?term_label)`, ordering is still correct if >>> I remove DISTINCT. For this specific query DISTINCT is not needed, but I do >>> need it for my application. Is there a reason to not expect DISTINCT to >>> work correctly with ORDER BY? >>> >>>> >>> >>>> Thanks both of you for all of your help, >>> >>>> Jim >>> >>>> >>> >>>> >>> >>> >>> >>> >>> >> >>> >> >>> > >>> >>> >> > > ------------------------------------------------------------------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > > |
From: Jim B. <ba...@ne...> - 2014-11-07 15:00:36
|
I can give this a try—it will be good experience since I am not really familiar with the Bigdata source code. So it may take me a little while. Thanks, Jim > On Nov 7, 2014, at 8:44 AM, Bryan Thompson <br...@sy...> wrote: > > Jim, > > Can you put together a unit test for this so we can avoid regressions? It would need to have a sufficiently large data set to allow the problem to be demonstrated. You would need to run both queries and compare the resulting ordering. The data would have to be something that could be committed into SVN, so with appropriate data rights and not too large. But still large enough. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://bigdata.com > http://mapgraph.io > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > On Fri, Nov 7, 2014 at 8:37 AM, Bryan Thompson <br...@sy...> wrote: > Jim. > > Ok. I was able to pull together the output of both queries into a single worksheet and then compare the rows and mark the rows that were not EQUALS and as such had a different ordering. > > I have created a ticket for this. See http://trac.bigdata.com/ticket/1044. > > I would appreciate it if you could have gone a little further with this and reduced the problem to something that clearly highlighted the problem. I had to spend quite a bit of time trying to figure out why you were seeing a problem in the output data. I could not spot any problem myself until I put the data sets side-by-side in Excel and even then I had to automate the comparison and then FILTER (in Excel) to find the rows where the output differed. > > I think that I know the root cause. I will update the ticket shortly and attach a file that you can test on your end for a fix. > > Thanks, > Bryan > > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://bigdata.com > http://mapgraph.io > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > On Thu, Nov 6, 2014 at 9:14 PM, Jim Balhoff <ba...@ne...> wrote: > I just realized my message may have been misleading. By "results are the same", I mean that the problem is still apparent. When using SELECT DISTINCT, ORDER BY does not work correctly and produces a different ordering compared to SELECT. > > > > > > On Nov 6, 2014, at 12:22 PM, Jim Balhoff <ba...@ne...> wrote: > > > > I updated the query to use the simple variable in ORDER BY, and the results are the same. > > > > Here is the exact query (with or without DISTINCT) for the linked results: > > > > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > > PREFIX owl: <http://www.w3.org/2002/07/owl#> > > > > SELECT DISTINCT ?term ?string_label > > WHERE > > { > > ?term rdf:type owl:Class . > > ?term rdfs:label ?term_label . > > BIND (STR(?term_label) AS ?string_label) > > } > > ORDER BY ?string_label > > > > > > Results (same number of rows either way): > > SELECT DISTINCT: > > explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > > result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > > > > SELECT: > > explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > > result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > > > > Thanks, > > Jim > > > > > > > >> On Nov 6, 2014, at 12:01 PM, Bryan Thompson <br...@sy...> wrote: > >> > >> What happens if you replace that last line with: > >> > >> ORDER BY ?string_label > >> > >> rather than > >> > >> ORDER BY STR(?string_label) > >> > >> Remember, it is assuming that the ORDER BY is using simple variables. > >> > >> Bryan > >> > >> On Thu, Nov 6, 2014 at 11:58 AM, Jim Balhoff <ba...@ne...> wrote: > >> Here is the exact query (with or without DISTINCT) for the linked results: > >> > >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > >> PREFIX owl: <http://www.w3.org/2002/07/owl#> > >> > >> SELECT DISTINCT ?term ?string_label > >> WHERE > >> { > >> ?term rdf:type owl:Class . > >> ?term rdfs:label ?term_label . > >> BIND (STR(?term_label) AS ?string_label) > >> } > >> ORDER BY STR(?string_label) > >> > >> > >> Results (same number of rows either way): > >> SELECT DISTINCT: > >> explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > >> result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > >> > >> SELECT: > >> explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > >> result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > >> > >> You can diff the two results files to see the out-of-order blocks. > >> > >> I suppose it does look like the DISTINCT query plan has ORDER BY applied before DISTINCT, if I am reading it right. > >> > >> Thanks, > >> Jim > >> > >> > >> > >> > >>> On Nov 6, 2014, at 10:10 AM, Bryan Thompson <br...@sy...> wrote: > >>> > >>> Jim, > >>> > >>> 502 is about support for expressions (other than simple variables in ORDER_BY). > >>> > >>> If there is an issue with DISTINCT + ORDER_BY then this would be a new ticket. > >>> > >>> Just post the EXPLAIN (attach to the email) for the moment. I want to see how this is being generated. We should then check the specification and make sure that the correct behavior is DISTINCT followed by ORDER BY with any limit applied after the ORDER BY. I can then check the code for how we are handling this. > >>> > >>> The relevant logic is in AST2BOpUtility at line 451. You can see that it is already attempting to handle this and that there was a historical ticket for this issue (#563). > >>> > >>> > >>> > >>> /* > >>> > >>> * Note: The DISTINCT operators also enforce the projection. > >>> > >>> * > >>> > >>> * Note: REDUCED allows, but does not require, either complete or > >>> > >>> * partial filtering of duplicates. It is part of what openrdf does > >>> > >>> * for a DESCRIBE query. > >>> > >>> * > >>> > >>> * Note: We do not currently have special operator for REDUCED. One > >>> > >>> * could be created using chunk wise DISTINCT. Note that REDUCED may > >>> > >>> * not change the order in which the solutions appear (but we are > >>> > >>> * evaluating it before ORDER BY so that is Ok.) > >>> > >>> * > >>> > >>> * TODO If there is an ORDER BY and a DISTINCT then the sort can be > >>> > >>> * used to impose the distinct without the overhead of a hash index > >>> > >>> * by filtering out the duplicate solutions after the sort. > >>> > >>> */ > >>> > >>> > >>> > >>> // When true, DISTINCT must preserve ORDER BY ordering. > >>> > >>> final boolean preserveOrder; > >>> > >>> > >>> > >>> if (orderBy != null && !orderBy.isEmpty()) { > >>> > >>> > >>> > >>> /* > >>> > >>> * Note: ORDER BY before DISTINCT, so DISTINCT must preserve > >>> > >>> * order. > >>> > >>> * > >>> > >>> * @see https://sourceforge.net/apps/trac/bigdata/ticket/563 > >>> > >>> * (ORDER BY + DISTINCT) > >>> > >>> */ > >>> > >>> > >>> preserveOrder = true; > >>> > >>> > >>> > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > >>> > >>> > >>> > >>> } else { > >>> > >>> > >>> preserveOrder = false; > >>> > >>> > >>> } > >>> > >>> > >>> > >>> if (projection.isDistinct() || projection.isReduced()) { > >>> > >>> > >>> > >>> left = addDistinct(left, queryBase, preserveOrder, ctx); > >>> > >>> > >>> > >>> } > >>> > >>> > >>> > >>> } else { > >>> > >>> > >>> > >>> /* > >>> > >>> * TODO Under what circumstances can the projection be [null]? > >>> > >>> */ > >>> > >>> > >>> if (orderBy != null && !orderBy.isEmpty()) { > >>> > >>> > >>> > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > >>> > >>> > >>> > >>> } > >>> > >>> > >>> > >>> } > >>> > >>> > >>> > >>> Bryan > >>> > >>> > >>> ---- > >>> Bryan Thompson > >>> Chief Scientist & Founder > >>> SYSTAP, LLC > >>> 4501 Tower Road > >>> Greensboro, NC 27410 > >>> br...@sy... > >>> http://bigdata.com > >>> http://mapgraph.io > >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > >>> > >>> > >>> > >>> On Thu, Nov 6, 2014 at 10:03 AM, Jim Balhoff <ba...@ne...> wrote: > >>> Hi Bryan, > >>> > >>> Just to clarify, would you like me to attach the info to ticket 502, or continue posting to the developer list? > >>> > >>> Thanks, > >>> Jim > >>> > >>> > >>>> On Nov 6, 2014, at 8:28 AM, Bryan Thompson <br...@sy...> wrote: > >>>> > >>>> The ticket for allowing aggregates in ORDER BY is: > >>>> > >>>> - http://trac.bigdata.com/ticket/502 (Allow aggregates in ORDER BY clause) > >>>> > >>>> Can you attach the EXPLAIN of the query with and without DISTINCT. The issue may be that the DISTINCT is being applied after the ORDER BY. I seem to remember some issue historically with operations being performed before/after the ORDER BY, but I do not have any distinct recollection of a problematic interaction between DISTINCT and ORDER BY. > >>>> > >>>> Bryan > >>>> > >>>> ---- > >>>> Bryan Thompson > >>>> Chief Scientist & Founder > >>>> SYSTAP, LLC > >>>> 4501 Tower Road > >>>> Greensboro, NC 27410 > >>>> br...@sy... > >>>> http://bigdata.com > >>>> http://mapgraph.io > >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > >>>> > >>>> > >>>> > >>>> On Wed, Nov 5, 2014 at 6:14 PM, Jim Balhoff <ba...@ne...> wrote: > >>>>> On Nov 5, 2014, at 5:46 PM, Jeremy J Carroll <jj...@sy...> wrote: > >>>>> > >>>>> > >>>>>> On Nov 5, 2014, at 1:02 PM, Bryan Thompson <br...@sy...> wrote: > >>>>>> > >>>>>> There could be an issue with ORDER BY operating on an anonymous and non-projected variable. Try declaring and binding a variable for STR(?label) inside of the query and then using that variable in the ORDER BY clause. > >>>>> > >>>>> > >>>>> Yes I tend to find the results of ORDER BY are more what I expect if I do not include an expression in the ORDER BY but simply variables. I BIND any expression before the ORDER BY. > >>>>> > >>>>> I believe there is a trac item for this, but since the workaround is easy, I have never seen it as high priority > >>>>> > >>>> > >>>> As suggested I tried binding a variable as `BIND (STR(?term_label) AS ?string_label)` and using that to sort. Still incorrect ordering. But, I tried removing DISTINCT, and then the ordering is correct. Even going back to the anonymous `ORDER BY STR(?term_label)`, ordering is still correct if I remove DISTINCT. For this specific query DISTINCT is not needed, but I do need it for my application. Is there a reason to not expect DISTINCT to work correctly with ORDER BY? > >>>> > >>>> Thanks both of you for all of your help, > >>>> Jim > >>>> > >>>> > >>> > >>> > >> > >> > > > > > |
From: Jim B. <ba...@ne...> - 2014-11-07 14:58:33
|
Hi Bryan, > On Nov 7, 2014, at 8:37 AM, Bryan Thompson <br...@sy...> wrote: > > Jim. > > Ok. I was able to pull together the output of both queries into a single worksheet and then compare the rows and mark the rows that were not EQUALS and as such had a different ordering. > > I have created a ticket for this. See http://trac.bigdata.com/ticket/1044. Great, thank you. > I would appreciate it if you could have gone a little further with this and reduced the problem to something that clearly highlighted the problem. I had to spend quite a bit of time trying to figure out why you were seeing a problem in the output data. I could not spot any problem myself until I put the data sets side-by-side in Excel and even then I had to automate the comparison and then FILTER (in Excel) to find the rows where the output differed. I was a little confused at first about what in the query was causing the problem. In the end perhaps I should have provided the diff output along with the two results files. Best regards, Jim > I think that I know the root cause. I will update the ticket shortly and attach a file that you can test on your end for a fix. > > Thanks, > Bryan > > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://bigdata.com > http://mapgraph.io > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > |
From: Jeremy J C. <jj...@sy...> - 2014-11-07 14:47:01
|
That’s very subtle: one way of approaching fixing this is as an ‘optimizer’ which rewrites the query to put the distinct in a subselect and the order by at the outer level In general the standard does not require the order by’s to be preserved from subselects, so this particular case looks like a special Jeremy > On Nov 7, 2014, at 5:44 AM, Bryan Thompson <br...@sy...> wrote: > > Jim, > > Can you put together a unit test for this so we can avoid regressions? It would need to have a sufficiently large data set to allow the problem to be demonstrated. You would need to run both queries and compare the resulting ordering. The data would have to be something that could be committed into SVN, so with appropriate data rights and not too large. But still large enough. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... <mailto:br...@sy...> > http://bigdata.com <http://bigdata.com/> > http://mapgraph.io <http://mapgraph.io/> > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > On Fri, Nov 7, 2014 at 8:37 AM, Bryan Thompson <br...@sy... <mailto:br...@sy...>> wrote: > Jim. > > Ok. I was able to pull together the output of both queries into a single worksheet and then compare the rows and mark the rows that were not EQUALS and as such had a different ordering. > > I have created a ticket for this. See http://trac.bigdata.com/ticket/1044 <http://trac.bigdata.com/ticket/1044>. > > I would appreciate it if you could have gone a little further with this and reduced the problem to something that clearly highlighted the problem. I had to spend quite a bit of time trying to figure out why you were seeing a problem in the output data. I could not spot any problem myself until I put the data sets side-by-side in Excel and even then I had to automate the comparison and then FILTER (in Excel) to find the rows where the output differed. > > I think that I know the root cause. I will update the ticket shortly and attach a file that you can test on your end for a fix. > > Thanks, > Bryan > > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... <mailto:br...@sy...> > http://bigdata.com <http://bigdata.com/> > http://mapgraph.io <http://mapgraph.io/> > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > On Thu, Nov 6, 2014 at 9:14 PM, Jim Balhoff <ba...@ne... <mailto:ba...@ne...>> wrote: > I just realized my message may have been misleading. By "results are the same", I mean that the problem is still apparent. When using SELECT DISTINCT, ORDER BY does not work correctly and produces a different ordering compared to SELECT. > > > > > > On Nov 6, 2014, at 12:22 PM, Jim Balhoff <ba...@ne... <mailto:ba...@ne...>> wrote: > > > > I updated the query to use the simple variable in ORDER BY, and the results are the same. > > > > Here is the exact query (with or without DISTINCT) for the linked results: > > > > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns# <http://www.w3.org/1999/02/22-rdf-syntax-ns#>> > > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema# <http://www.w3.org/2000/01/rdf-schema#>> > > PREFIX owl: <http://www.w3.org/2002/07/owl# <http://www.w3.org/2002/07/owl#>> > > > > SELECT DISTINCT ?term ?string_label > > WHERE > > { > > ?term rdf:type owl:Class . > > ?term rdfs:label ?term_label . > > BIND (STR(?term_label) AS ?string_label) > > } > > ORDER BY ?string_label > > > > > > Results (same number of rows either way): > > SELECT DISTINCT: > > explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html <https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html> > > result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv <https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv> > > > > SELECT: > > explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html <https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html> > > result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv <https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv> > > > > Thanks, > > Jim > > > > > > > >> On Nov 6, 2014, at 12:01 PM, Bryan Thompson <br...@sy... <mailto:br...@sy...>> wrote: > >> > >> What happens if you replace that last line with: > >> > >> ORDER BY ?string_label > >> > >> rather than > >> > >> ORDER BY STR(?string_label) > >> > >> Remember, it is assuming that the ORDER BY is using simple variables. > >> > >> Bryan > >> > >> On Thu, Nov 6, 2014 at 11:58 AM, Jim Balhoff <ba...@ne... <mailto:ba...@ne...>> wrote: > >> Here is the exact query (with or without DISTINCT) for the linked results: > >> > >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns# <http://www.w3.org/1999/02/22-rdf-syntax-ns#>> > >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema# <http://www.w3.org/2000/01/rdf-schema#>> > >> PREFIX owl: <http://www.w3.org/2002/07/owl# <http://www.w3.org/2002/07/owl#>> > >> > >> SELECT DISTINCT ?term ?string_label > >> WHERE > >> { > >> ?term rdf:type owl:Class . > >> ?term rdfs:label ?term_label . > >> BIND (STR(?term_label) AS ?string_label) > >> } > >> ORDER BY STR(?string_label) > >> > >> > >> Results (same number of rows either way): > >> SELECT DISTINCT: > >> explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html <https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html> > >> result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv <https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv> > >> > >> SELECT: > >> explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html <https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html> > >> result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv <https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv> > >> > >> You can diff the two results files to see the out-of-order blocks. > >> > >> I suppose it does look like the DISTINCT query plan has ORDER BY applied before DISTINCT, if I am reading it right. > >> > >> Thanks, > >> Jim > >> > >> > >> > >> > >>> On Nov 6, 2014, at 10:10 AM, Bryan Thompson <br...@sy... <mailto:br...@sy...>> wrote: > >>> > >>> Jim, > >>> > >>> 502 is about support for expressions (other than simple variables in ORDER_BY). > >>> > >>> If there is an issue with DISTINCT + ORDER_BY then this would be a new ticket. > >>> > >>> Just post the EXPLAIN (attach to the email) for the moment. I want to see how this is being generated. We should then check the specification and make sure that the correct behavior is DISTINCT followed by ORDER BY with any limit applied after the ORDER BY. I can then check the code for how we are handling this. > >>> > >>> The relevant logic is in AST2BOpUtility at line 451. You can see that it is already attempting to handle this and that there was a historical ticket for this issue (#563). > >>> > >>> > >>> > >>> /* > >>> > >>> * Note: The DISTINCT operators also enforce the projection. > >>> > >>> * > >>> > >>> * Note: REDUCED allows, but does not require, either complete or > >>> > >>> * partial filtering of duplicates. It is part of what openrdf does > >>> > >>> * for a DESCRIBE query. > >>> > >>> * > >>> > >>> * Note: We do not currently have special operator for REDUCED. One > >>> > >>> * could be created using chunk wise DISTINCT. Note that REDUCED may > >>> > >>> * not change the order in which the solutions appear (but we are > >>> > >>> * evaluating it before ORDER BY so that is Ok.) > >>> > >>> * > >>> > >>> * TODO If there is an ORDER BY and a DISTINCT then the sort can be > >>> > >>> * used to impose the distinct without the overhead of a hash index > >>> > >>> * by filtering out the duplicate solutions after the sort. > >>> > >>> */ > >>> > >>> > >>> > >>> // When true, DISTINCT must preserve ORDER BY ordering. > >>> > >>> final boolean preserveOrder; > >>> > >>> > >>> > >>> if (orderBy != null && !orderBy.isEmpty()) { > >>> > >>> > >>> > >>> /* > >>> > >>> * Note: ORDER BY before DISTINCT, so DISTINCT must preserve > >>> > >>> * order. > >>> > >>> * > >>> > >>> * @see https://sourceforge.net/apps/trac/bigdata/ticket/563 <https://sourceforge.net/apps/trac/bigdata/ticket/563> > >>> > >>> * (ORDER BY + DISTINCT) > >>> > >>> */ > >>> > >>> > >>> preserveOrder = true; > >>> > >>> > >>> > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > >>> > >>> > >>> > >>> } else { > >>> > >>> > >>> preserveOrder = false; > >>> > >>> > >>> } > >>> > >>> > >>> > >>> if (projection.isDistinct() || projection.isReduced()) { > >>> > >>> > >>> > >>> left = addDistinct(left, queryBase, preserveOrder, ctx); > >>> > >>> > >>> > >>> } > >>> > >>> > >>> > >>> } else { > >>> > >>> > >>> > >>> /* > >>> > >>> * TODO Under what circumstances can the projection be [null]? > >>> > >>> */ > >>> > >>> > >>> if (orderBy != null && !orderBy.isEmpty()) { > >>> > >>> > >>> > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > >>> > >>> > >>> > >>> } > >>> > >>> > >>> > >>> } > >>> > >>> > >>> > >>> Bryan > >>> > >>> > >>> ---- > >>> Bryan Thompson > >>> Chief Scientist & Founder > >>> SYSTAP, LLC > >>> 4501 Tower Road > >>> Greensboro, NC 27410 > >>> br...@sy... <mailto:br...@sy...> > >>> http://bigdata.com <http://bigdata.com/> > >>> http://mapgraph.io <http://mapgraph.io/> > >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > >>> > >>> > >>> > >>> On Thu, Nov 6, 2014 at 10:03 AM, Jim Balhoff <ba...@ne... <mailto:ba...@ne...>> wrote: > >>> Hi Bryan, > >>> > >>> Just to clarify, would you like me to attach the info to ticket 502, or continue posting to the developer list? > >>> > >>> Thanks, > >>> Jim > >>> > >>> > >>>> On Nov 6, 2014, at 8:28 AM, Bryan Thompson <br...@sy... <mailto:br...@sy...>> wrote: > >>>> > >>>> The ticket for allowing aggregates in ORDER BY is: > >>>> > >>>> - http://trac.bigdata.com/ticket/502 <http://trac.bigdata.com/ticket/502> (Allow aggregates in ORDER BY clause) > >>>> > >>>> Can you attach the EXPLAIN of the query with and without DISTINCT. The issue may be that the DISTINCT is being applied after the ORDER BY. I seem to remember some issue historically with operations being performed before/after the ORDER BY, but I do not have any distinct recollection of a problematic interaction between DISTINCT and ORDER BY. > >>>> > >>>> Bryan > >>>> > >>>> ---- > >>>> Bryan Thompson > >>>> Chief Scientist & Founder > >>>> SYSTAP, LLC > >>>> 4501 Tower Road > >>>> Greensboro, NC 27410 > >>>> br...@sy... <mailto:br...@sy...> > >>>> http://bigdata.com <http://bigdata.com/> > >>>> http://mapgraph.io <http://mapgraph.io/> > >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > >>>> > >>>> > >>>> > >>>> On Wed, Nov 5, 2014 at 6:14 PM, Jim Balhoff <ba...@ne... <mailto:ba...@ne...>> wrote: > >>>>> On Nov 5, 2014, at 5:46 PM, Jeremy J Carroll <jj...@sy... <mailto:jj...@sy...>> wrote: > >>>>> > >>>>> > >>>>>> On Nov 5, 2014, at 1:02 PM, Bryan Thompson <br...@sy... <mailto:br...@sy...>> wrote: > >>>>>> > >>>>>> There could be an issue with ORDER BY operating on an anonymous and non-projected variable. Try declaring and binding a variable for STR(?label) inside of the query and then using that variable in the ORDER BY clause. > >>>>> > >>>>> > >>>>> Yes I tend to find the results of ORDER BY are more what I expect if I do not include an expression in the ORDER BY but simply variables. I BIND any expression before the ORDER BY. > >>>>> > >>>>> I believe there is a trac item for this, but since the workaround is easy, I have never seen it as high priority > >>>>> > >>>> > >>>> As suggested I tried binding a variable as `BIND (STR(?term_label) AS ?string_label)` and using that to sort. Still incorrect ordering. But, I tried removing DISTINCT, and then the ordering is correct. Even going back to the anonymous `ORDER BY STR(?term_label)`, ordering is still correct if I remove DISTINCT. For this specific query DISTINCT is not needed, but I do need it for my application. Is there a reason to not expect DISTINCT to work correctly with ORDER BY? > >>>> > >>>> Thanks both of you for all of your help, > >>>> Jim > >>>> > >>>> > >>> > >>> > >> > >> > > > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
From: Bryan T. <br...@sy...> - 2014-11-07 13:44:52
|
Jim, Can you put together a unit test for this so we can avoid regressions? It would need to have a sufficiently large data set to allow the problem to be demonstrated. You would need to run both queries and compare the resulting ordering. The data would have to be something that could be committed into SVN, so with appropriate data rights and not too large. But still large enough. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Fri, Nov 7, 2014 at 8:37 AM, Bryan Thompson <br...@sy...> wrote: > Jim. > > Ok. I was able to pull together the output of both queries into a single > worksheet and then compare the rows and mark the rows that were not EQUALS > and as such had a different ordering. > > I have created a ticket for this. See http://trac.bigdata.com/ticket/1044 > . > > I would appreciate it if you could have gone a little further with this > and reduced the problem to something that clearly highlighted the problem. > I had to spend quite a bit of time trying to figure out why you were seeing > a problem in the output data. I could not spot any problem myself until I > put the data sets side-by-side in Excel and even then I had to automate the > comparison and then FILTER (in Excel) to find the rows where the output > differed. > > I think that I know the root cause. I will update the ticket shortly and > attach a file that you can test on your end for a fix. > > Thanks, > Bryan > > > ---- > Bryan Thompson > Chief Scientist & Founder > SYSTAP, LLC > 4501 Tower Road > Greensboro, NC 27410 > br...@sy... > http://bigdata.com > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > > On Thu, Nov 6, 2014 at 9:14 PM, Jim Balhoff <ba...@ne...> wrote: > >> I just realized my message may have been misleading. By "results are the >> same", I mean that the problem is still apparent. When using SELECT >> DISTINCT, ORDER BY does not work correctly and produces a different >> ordering compared to SELECT. >> >> >> > >> > On Nov 6, 2014, at 12:22 PM, Jim Balhoff <ba...@ne...> wrote: >> > >> > I updated the query to use the simple variable in ORDER BY, and the >> results are the same. >> > >> > Here is the exact query (with or without DISTINCT) for the linked >> results: >> > >> > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >> > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> >> > PREFIX owl: <http://www.w3.org/2002/07/owl#> >> > >> > SELECT DISTINCT ?term ?string_label >> > WHERE >> > { >> > ?term rdf:type owl:Class . >> > ?term rdfs:label ?term_label . >> > BIND (STR(?term_label) AS ?string_label) >> > } >> > ORDER BY ?string_label >> > >> > >> > Results (same number of rows either way): >> > SELECT DISTINCT: >> > explain: >> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html >> > result: >> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv >> > >> > SELECT: >> > explain: >> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html >> > result: >> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv >> > >> > Thanks, >> > Jim >> > >> > >> > >> >> On Nov 6, 2014, at 12:01 PM, Bryan Thompson <br...@sy...> wrote: >> >> >> >> What happens if you replace that last line with: >> >> >> >> ORDER BY ?string_label >> >> >> >> rather than >> >> >> >> ORDER BY STR(?string_label) >> >> >> >> Remember, it is assuming that the ORDER BY is using simple variables. >> >> >> >> Bryan >> >> >> >> On Thu, Nov 6, 2014 at 11:58 AM, Jim Balhoff <ba...@ne...> >> wrote: >> >> Here is the exact query (with or without DISTINCT) for the linked >> results: >> >> >> >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >> >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> >> >> PREFIX owl: <http://www.w3.org/2002/07/owl#> >> >> >> >> SELECT DISTINCT ?term ?string_label >> >> WHERE >> >> { >> >> ?term rdf:type owl:Class . >> >> ?term rdfs:label ?term_label . >> >> BIND (STR(?term_label) AS ?string_label) >> >> } >> >> ORDER BY STR(?string_label) >> >> >> >> >> >> Results (same number of rows either way): >> >> SELECT DISTINCT: >> >> explain: >> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html >> >> result: >> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv >> >> >> >> SELECT: >> >> explain: >> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html >> >> result: >> https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv >> >> >> >> You can diff the two results files to see the out-of-order blocks. >> >> >> >> I suppose it does look like the DISTINCT query plan has ORDER BY >> applied before DISTINCT, if I am reading it right. >> >> >> >> Thanks, >> >> Jim >> >> >> >> >> >> >> >> >> >>> On Nov 6, 2014, at 10:10 AM, Bryan Thompson <br...@sy...> wrote: >> >>> >> >>> Jim, >> >>> >> >>> 502 is about support for expressions (other than simple variables in >> ORDER_BY). >> >>> >> >>> If there is an issue with DISTINCT + ORDER_BY then this would be a >> new ticket. >> >>> >> >>> Just post the EXPLAIN (attach to the email) for the moment. I want >> to see how this is being generated. We should then check the specification >> and make sure that the correct behavior is DISTINCT followed by ORDER BY >> with any limit applied after the ORDER BY. I can then check the code for >> how we are handling this. >> >>> >> >>> The relevant logic is in AST2BOpUtility at line 451. You can see >> that it is already attempting to handle this and that there was a >> historical ticket for this issue (#563). >> >>> >> >>> >> >>> >> >>> /* >> >>> >> >>> * Note: The DISTINCT operators also enforce the >> projection. >> >>> >> >>> * >> >>> >> >>> * Note: REDUCED allows, but does not require, either >> complete or >> >>> >> >>> * partial filtering of duplicates. It is part of what >> openrdf does >> >>> >> >>> * for a DESCRIBE query. >> >>> >> >>> * >> >>> >> >>> * Note: We do not currently have special operator for >> REDUCED. One >> >>> >> >>> * could be created using chunk wise DISTINCT. Note that >> REDUCED may >> >>> >> >>> * not change the order in which the solutions appear (but >> we are >> >>> >> >>> * evaluating it before ORDER BY so that is Ok.) >> >>> >> >>> * >> >>> >> >>> * TODO If there is an ORDER BY and a DISTINCT then the >> sort can be >> >>> >> >>> * used to impose the distinct without the overhead of a >> hash index >> >>> >> >>> * by filtering out the duplicate solutions after the sort. >> >>> >> >>> */ >> >>> >> >>> >> >>> >> >>> // When true, DISTINCT must preserve ORDER BY ordering. >> >>> >> >>> final boolean preserveOrder; >> >>> >> >>> >> >>> >> >>> if (orderBy != null && !orderBy.isEmpty()) { >> >>> >> >>> >> >>> >> >>> /* >> >>> >> >>> * Note: ORDER BY before DISTINCT, so DISTINCT must >> preserve >> >>> >> >>> * order. >> >>> >> >>> * >> >>> >> >>> * @see >> https://sourceforge.net/apps/trac/bigdata/ticket/563 >> >>> >> >>> * (ORDER BY + DISTINCT) >> >>> >> >>> */ >> >>> >> >>> >> >>> preserveOrder = true; >> >>> >> >>> >> >>> >> >>> left = addOrderBy(left, queryBase, orderBy, ctx); >> >>> >> >>> >> >>> >> >>> } else { >> >>> >> >>> >> >>> preserveOrder = false; >> >>> >> >>> >> >>> } >> >>> >> >>> >> >>> >> >>> if (projection.isDistinct() || projection.isReduced()) { >> >>> >> >>> >> >>> >> >>> left = addDistinct(left, queryBase, preserveOrder, >> ctx); >> >>> >> >>> >> >>> >> >>> } >> >>> >> >>> >> >>> >> >>> } else { >> >>> >> >>> >> >>> >> >>> /* >> >>> >> >>> * TODO Under what circumstances can the projection be >> [null]? >> >>> >> >>> */ >> >>> >> >>> >> >>> if (orderBy != null && !orderBy.isEmpty()) { >> >>> >> >>> >> >>> >> >>> left = addOrderBy(left, queryBase, orderBy, ctx); >> >>> >> >>> >> >>> >> >>> } >> >>> >> >>> >> >>> >> >>> } >> >>> >> >>> >> >>> >> >>> Bryan >> >>> >> >>> >> >>> ---- >> >>> Bryan Thompson >> >>> Chief Scientist & Founder >> >>> SYSTAP, LLC >> >>> 4501 Tower Road >> >>> Greensboro, NC 27410 >> >>> br...@sy... >> >>> http://bigdata.com >> >>> http://mapgraph.io >> >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments >> are for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. >> >>> >> >>> >> >>> >> >>> On Thu, Nov 6, 2014 at 10:03 AM, Jim Balhoff <ba...@ne...> >> wrote: >> >>> Hi Bryan, >> >>> >> >>> Just to clarify, would you like me to attach the info to ticket 502, >> or continue posting to the developer list? >> >>> >> >>> Thanks, >> >>> Jim >> >>> >> >>> >> >>>> On Nov 6, 2014, at 8:28 AM, Bryan Thompson <br...@sy...> wrote: >> >>>> >> >>>> The ticket for allowing aggregates in ORDER BY is: >> >>>> >> >>>> - http://trac.bigdata.com/ticket/502 (Allow aggregates in ORDER BY >> clause) >> >>>> >> >>>> Can you attach the EXPLAIN of the query with and without DISTINCT. >> The issue may be that the DISTINCT is being applied after the ORDER BY. I >> seem to remember some issue historically with operations being performed >> before/after the ORDER BY, but I do not have any distinct recollection of a >> problematic interaction between DISTINCT and ORDER BY. >> >>>> >> >>>> Bryan >> >>>> >> >>>> ---- >> >>>> Bryan Thompson >> >>>> Chief Scientist & Founder >> >>>> SYSTAP, LLC >> >>>> 4501 Tower Road >> >>>> Greensboro, NC 27410 >> >>>> br...@sy... >> >>>> http://bigdata.com >> >>>> http://mapgraph.io >> >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments >> are for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP. Any unauthorized review, use, disclosure, >> dissemination or copying of this email or its contents or attachments is >> prohibited. If you have received this communication in error, please notify >> the sender by reply email and permanently delete all copies of the email >> and its contents and attachments. >> >>>> >> >>>> >> >>>> >> >>>> On Wed, Nov 5, 2014 at 6:14 PM, Jim Balhoff <ba...@ne...> >> wrote: >> >>>>> On Nov 5, 2014, at 5:46 PM, Jeremy J Carroll <jj...@sy...> >> wrote: >> >>>>> >> >>>>> >> >>>>>> On Nov 5, 2014, at 1:02 PM, Bryan Thompson <br...@sy...> >> wrote: >> >>>>>> >> >>>>>> There could be an issue with ORDER BY operating on an anonymous >> and non-projected variable. Try declaring and binding a variable for >> STR(?label) inside of the query and then using that variable in the ORDER >> BY clause. >> >>>>> >> >>>>> >> >>>>> Yes I tend to find the results of ORDER BY are more what I expect >> if I do not include an expression in the ORDER BY but simply variables. I >> BIND any expression before the ORDER BY. >> >>>>> >> >>>>> I believe there is a trac item for this, but since the workaround >> is easy, I have never seen it as high priority >> >>>>> >> >>>> >> >>>> As suggested I tried binding a variable as `BIND (STR(?term_label) >> AS ?string_label)` and using that to sort. Still incorrect ordering. But, I >> tried removing DISTINCT, and then the ordering is correct. Even going back >> to the anonymous `ORDER BY STR(?term_label)`, ordering is still correct if >> I remove DISTINCT. For this specific query DISTINCT is not needed, but I do >> need it for my application. Is there a reason to not expect DISTINCT to >> work correctly with ORDER BY? >> >>>> >> >>>> Thanks both of you for all of your help, >> >>>> Jim >> >>>> >> >>>> >> >>> >> >>> >> >> >> >> >> > >> >> > |
From: Bryan T. <br...@sy...> - 2014-11-07 13:37:57
|
Jim. Ok. I was able to pull together the output of both queries into a single worksheet and then compare the rows and mark the rows that were not EQUALS and as such had a different ordering. I have created a ticket for this. See http://trac.bigdata.com/ticket/1044. I would appreciate it if you could have gone a little further with this and reduced the problem to something that clearly highlighted the problem. I had to spend quite a bit of time trying to figure out why you were seeing a problem in the output data. I could not spot any problem myself until I put the data sets side-by-side in Excel and even then I had to automate the comparison and then FILTER (in Excel) to find the rows where the output differed. I think that I know the root cause. I will update the ticket shortly and attach a file that you can test on your end for a fix. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Nov 6, 2014 at 9:14 PM, Jim Balhoff <ba...@ne...> wrote: > I just realized my message may have been misleading. By "results are the > same", I mean that the problem is still apparent. When using SELECT > DISTINCT, ORDER BY does not work correctly and produces a different > ordering compared to SELECT. > > > > > > On Nov 6, 2014, at 12:22 PM, Jim Balhoff <ba...@ne...> wrote: > > > > I updated the query to use the simple variable in ORDER BY, and the > results are the same. > > > > Here is the exact query (with or without DISTINCT) for the linked > results: > > > > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > > PREFIX owl: <http://www.w3.org/2002/07/owl#> > > > > SELECT DISTINCT ?term ?string_label > > WHERE > > { > > ?term rdf:type owl:Class . > > ?term rdfs:label ?term_label . > > BIND (STR(?term_label) AS ?string_label) > > } > > ORDER BY ?string_label > > > > > > Results (same number of rows either way): > > SELECT DISTINCT: > > explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > > result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > > > > SELECT: > > explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > > result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > > > > Thanks, > > Jim > > > > > > > >> On Nov 6, 2014, at 12:01 PM, Bryan Thompson <br...@sy...> wrote: > >> > >> What happens if you replace that last line with: > >> > >> ORDER BY ?string_label > >> > >> rather than > >> > >> ORDER BY STR(?string_label) > >> > >> Remember, it is assuming that the ORDER BY is using simple variables. > >> > >> Bryan > >> > >> On Thu, Nov 6, 2014 at 11:58 AM, Jim Balhoff <ba...@ne...> > wrote: > >> Here is the exact query (with or without DISTINCT) for the linked > results: > >> > >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > >> PREFIX owl: <http://www.w3.org/2002/07/owl#> > >> > >> SELECT DISTINCT ?term ?string_label > >> WHERE > >> { > >> ?term rdf:type owl:Class . > >> ?term rdfs:label ?term_label . > >> BIND (STR(?term_label) AS ?string_label) > >> } > >> ORDER BY STR(?string_label) > >> > >> > >> Results (same number of rows either way): > >> SELECT DISTINCT: > >> explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > >> result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > >> > >> SELECT: > >> explain: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > >> result: > https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > >> > >> You can diff the two results files to see the out-of-order blocks. > >> > >> I suppose it does look like the DISTINCT query plan has ORDER BY > applied before DISTINCT, if I am reading it right. > >> > >> Thanks, > >> Jim > >> > >> > >> > >> > >>> On Nov 6, 2014, at 10:10 AM, Bryan Thompson <br...@sy...> wrote: > >>> > >>> Jim, > >>> > >>> 502 is about support for expressions (other than simple variables in > ORDER_BY). > >>> > >>> If there is an issue with DISTINCT + ORDER_BY then this would be a new > ticket. > >>> > >>> Just post the EXPLAIN (attach to the email) for the moment. I want to > see how this is being generated. We should then check the specification > and make sure that the correct behavior is DISTINCT followed by ORDER BY > with any limit applied after the ORDER BY. I can then check the code for > how we are handling this. > >>> > >>> The relevant logic is in AST2BOpUtility at line 451. You can see that > it is already attempting to handle this and that there was a historical > ticket for this issue (#563). > >>> > >>> > >>> > >>> /* > >>> > >>> * Note: The DISTINCT operators also enforce the projection. > >>> > >>> * > >>> > >>> * Note: REDUCED allows, but does not require, either > complete or > >>> > >>> * partial filtering of duplicates. It is part of what > openrdf does > >>> > >>> * for a DESCRIBE query. > >>> > >>> * > >>> > >>> * Note: We do not currently have special operator for > REDUCED. One > >>> > >>> * could be created using chunk wise DISTINCT. Note that > REDUCED may > >>> > >>> * not change the order in which the solutions appear (but > we are > >>> > >>> * evaluating it before ORDER BY so that is Ok.) > >>> > >>> * > >>> > >>> * TODO If there is an ORDER BY and a DISTINCT then the > sort can be > >>> > >>> * used to impose the distinct without the overhead of a > hash index > >>> > >>> * by filtering out the duplicate solutions after the sort. > >>> > >>> */ > >>> > >>> > >>> > >>> // When true, DISTINCT must preserve ORDER BY ordering. > >>> > >>> final boolean preserveOrder; > >>> > >>> > >>> > >>> if (orderBy != null && !orderBy.isEmpty()) { > >>> > >>> > >>> > >>> /* > >>> > >>> * Note: ORDER BY before DISTINCT, so DISTINCT must > preserve > >>> > >>> * order. > >>> > >>> * > >>> > >>> * @see > https://sourceforge.net/apps/trac/bigdata/ticket/563 > >>> > >>> * (ORDER BY + DISTINCT) > >>> > >>> */ > >>> > >>> > >>> preserveOrder = true; > >>> > >>> > >>> > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > >>> > >>> > >>> > >>> } else { > >>> > >>> > >>> preserveOrder = false; > >>> > >>> > >>> } > >>> > >>> > >>> > >>> if (projection.isDistinct() || projection.isReduced()) { > >>> > >>> > >>> > >>> left = addDistinct(left, queryBase, preserveOrder, ctx); > >>> > >>> > >>> > >>> } > >>> > >>> > >>> > >>> } else { > >>> > >>> > >>> > >>> /* > >>> > >>> * TODO Under what circumstances can the projection be > [null]? > >>> > >>> */ > >>> > >>> > >>> if (orderBy != null && !orderBy.isEmpty()) { > >>> > >>> > >>> > >>> left = addOrderBy(left, queryBase, orderBy, ctx); > >>> > >>> > >>> > >>> } > >>> > >>> > >>> > >>> } > >>> > >>> > >>> > >>> Bryan > >>> > >>> > >>> ---- > >>> Bryan Thompson > >>> Chief Scientist & Founder > >>> SYSTAP, LLC > >>> 4501 Tower Road > >>> Greensboro, NC 27410 > >>> br...@sy... > >>> http://bigdata.com > >>> http://mapgraph.io > >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments > are for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > >>> > >>> > >>> > >>> On Thu, Nov 6, 2014 at 10:03 AM, Jim Balhoff <ba...@ne...> > wrote: > >>> Hi Bryan, > >>> > >>> Just to clarify, would you like me to attach the info to ticket 502, > or continue posting to the developer list? > >>> > >>> Thanks, > >>> Jim > >>> > >>> > >>>> On Nov 6, 2014, at 8:28 AM, Bryan Thompson <br...@sy...> wrote: > >>>> > >>>> The ticket for allowing aggregates in ORDER BY is: > >>>> > >>>> - http://trac.bigdata.com/ticket/502 (Allow aggregates in ORDER BY > clause) > >>>> > >>>> Can you attach the EXPLAIN of the query with and without DISTINCT. > The issue may be that the DISTINCT is being applied after the ORDER BY. I > seem to remember some issue historically with operations being performed > before/after the ORDER BY, but I do not have any distinct recollection of a > problematic interaction between DISTINCT and ORDER BY. > >>>> > >>>> Bryan > >>>> > >>>> ---- > >>>> Bryan Thompson > >>>> Chief Scientist & Founder > >>>> SYSTAP, LLC > >>>> 4501 Tower Road > >>>> Greensboro, NC 27410 > >>>> br...@sy... > >>>> http://bigdata.com > >>>> http://mapgraph.io > >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments > are for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP. Any unauthorized review, use, disclosure, > dissemination or copying of this email or its contents or attachments is > prohibited. If you have received this communication in error, please notify > the sender by reply email and permanently delete all copies of the email > and its contents and attachments. > >>>> > >>>> > >>>> > >>>> On Wed, Nov 5, 2014 at 6:14 PM, Jim Balhoff <ba...@ne...> > wrote: > >>>>> On Nov 5, 2014, at 5:46 PM, Jeremy J Carroll <jj...@sy...> wrote: > >>>>> > >>>>> > >>>>>> On Nov 5, 2014, at 1:02 PM, Bryan Thompson <br...@sy...> > wrote: > >>>>>> > >>>>>> There could be an issue with ORDER BY operating on an anonymous and > non-projected variable. Try declaring and binding a variable for > STR(?label) inside of the query and then using that variable in the ORDER > BY clause. > >>>>> > >>>>> > >>>>> Yes I tend to find the results of ORDER BY are more what I expect if > I do not include an expression in the ORDER BY but simply variables. I BIND > any expression before the ORDER BY. > >>>>> > >>>>> I believe there is a trac item for this, but since the workaround is > easy, I have never seen it as high priority > >>>>> > >>>> > >>>> As suggested I tried binding a variable as `BIND (STR(?term_label) AS > ?string_label)` and using that to sort. Still incorrect ordering. But, I > tried removing DISTINCT, and then the ordering is correct. Even going back > to the anonymous `ORDER BY STR(?term_label)`, ordering is still correct if > I remove DISTINCT. For this specific query DISTINCT is not needed, but I do > need it for my application. Is there a reason to not expect DISTINCT to > work correctly with ORDER BY? > >>>> > >>>> Thanks both of you for all of your help, > >>>> Jim > >>>> > >>>> > >>> > >>> > >> > >> > > > > |
From: Jim B. <ba...@ne...> - 2014-11-07 02:14:27
|
I just realized my message may have been misleading. By "results are the same", I mean that the problem is still apparent. When using SELECT DISTINCT, ORDER BY does not work correctly and produces a different ordering compared to SELECT. > > On Nov 6, 2014, at 12:22 PM, Jim Balhoff <ba...@ne...> wrote: > > I updated the query to use the simple variable in ORDER BY, and the results are the same. > > Here is the exact query (with or without DISTINCT) for the linked results: > > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > PREFIX owl: <http://www.w3.org/2002/07/owl#> > > SELECT DISTINCT ?term ?string_label > WHERE > { > ?term rdf:type owl:Class . > ?term rdfs:label ?term_label . > BIND (STR(?term_label) AS ?string_label) > } > ORDER BY ?string_label > > > Results (same number of rows either way): > SELECT DISTINCT: > explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > > SELECT: > explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > > Thanks, > Jim > > > >> On Nov 6, 2014, at 12:01 PM, Bryan Thompson <br...@sy...> wrote: >> >> What happens if you replace that last line with: >> >> ORDER BY ?string_label >> >> rather than >> >> ORDER BY STR(?string_label) >> >> Remember, it is assuming that the ORDER BY is using simple variables. >> >> Bryan >> >> On Thu, Nov 6, 2014 at 11:58 AM, Jim Balhoff <ba...@ne...> wrote: >> Here is the exact query (with or without DISTINCT) for the linked results: >> >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> >> PREFIX owl: <http://www.w3.org/2002/07/owl#> >> >> SELECT DISTINCT ?term ?string_label >> WHERE >> { >> ?term rdf:type owl:Class . >> ?term rdfs:label ?term_label . >> BIND (STR(?term_label) AS ?string_label) >> } >> ORDER BY STR(?string_label) >> >> >> Results (same number of rows either way): >> SELECT DISTINCT: >> explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html >> result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv >> >> SELECT: >> explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html >> result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv >> >> You can diff the two results files to see the out-of-order blocks. >> >> I suppose it does look like the DISTINCT query plan has ORDER BY applied before DISTINCT, if I am reading it right. >> >> Thanks, >> Jim >> >> >> >> >>> On Nov 6, 2014, at 10:10 AM, Bryan Thompson <br...@sy...> wrote: >>> >>> Jim, >>> >>> 502 is about support for expressions (other than simple variables in ORDER_BY). >>> >>> If there is an issue with DISTINCT + ORDER_BY then this would be a new ticket. >>> >>> Just post the EXPLAIN (attach to the email) for the moment. I want to see how this is being generated. We should then check the specification and make sure that the correct behavior is DISTINCT followed by ORDER BY with any limit applied after the ORDER BY. I can then check the code for how we are handling this. >>> >>> The relevant logic is in AST2BOpUtility at line 451. You can see that it is already attempting to handle this and that there was a historical ticket for this issue (#563). >>> >>> >>> >>> /* >>> >>> * Note: The DISTINCT operators also enforce the projection. >>> >>> * >>> >>> * Note: REDUCED allows, but does not require, either complete or >>> >>> * partial filtering of duplicates. It is part of what openrdf does >>> >>> * for a DESCRIBE query. >>> >>> * >>> >>> * Note: We do not currently have special operator for REDUCED. One >>> >>> * could be created using chunk wise DISTINCT. Note that REDUCED may >>> >>> * not change the order in which the solutions appear (but we are >>> >>> * evaluating it before ORDER BY so that is Ok.) >>> >>> * >>> >>> * TODO If there is an ORDER BY and a DISTINCT then the sort can be >>> >>> * used to impose the distinct without the overhead of a hash index >>> >>> * by filtering out the duplicate solutions after the sort. >>> >>> */ >>> >>> >>> >>> // When true, DISTINCT must preserve ORDER BY ordering. >>> >>> final boolean preserveOrder; >>> >>> >>> >>> if (orderBy != null && !orderBy.isEmpty()) { >>> >>> >>> >>> /* >>> >>> * Note: ORDER BY before DISTINCT, so DISTINCT must preserve >>> >>> * order. >>> >>> * >>> >>> * @see https://sourceforge.net/apps/trac/bigdata/ticket/563 >>> >>> * (ORDER BY + DISTINCT) >>> >>> */ >>> >>> >>> preserveOrder = true; >>> >>> >>> >>> left = addOrderBy(left, queryBase, orderBy, ctx); >>> >>> >>> >>> } else { >>> >>> >>> preserveOrder = false; >>> >>> >>> } >>> >>> >>> >>> if (projection.isDistinct() || projection.isReduced()) { >>> >>> >>> >>> left = addDistinct(left, queryBase, preserveOrder, ctx); >>> >>> >>> >>> } >>> >>> >>> >>> } else { >>> >>> >>> >>> /* >>> >>> * TODO Under what circumstances can the projection be [null]? >>> >>> */ >>> >>> >>> if (orderBy != null && !orderBy.isEmpty()) { >>> >>> >>> >>> left = addOrderBy(left, queryBase, orderBy, ctx); >>> >>> >>> >>> } >>> >>> >>> >>> } >>> >>> >>> >>> Bryan >>> >>> >>> ---- >>> Bryan Thompson >>> Chief Scientist & Founder >>> SYSTAP, LLC >>> 4501 Tower Road >>> Greensboro, NC 27410 >>> br...@sy... >>> http://bigdata.com >>> http://mapgraph.io >>> CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. >>> >>> >>> >>> On Thu, Nov 6, 2014 at 10:03 AM, Jim Balhoff <ba...@ne...> wrote: >>> Hi Bryan, >>> >>> Just to clarify, would you like me to attach the info to ticket 502, or continue posting to the developer list? >>> >>> Thanks, >>> Jim >>> >>> >>>> On Nov 6, 2014, at 8:28 AM, Bryan Thompson <br...@sy...> wrote: >>>> >>>> The ticket for allowing aggregates in ORDER BY is: >>>> >>>> - http://trac.bigdata.com/ticket/502 (Allow aggregates in ORDER BY clause) >>>> >>>> Can you attach the EXPLAIN of the query with and without DISTINCT. The issue may be that the DISTINCT is being applied after the ORDER BY. I seem to remember some issue historically with operations being performed before/after the ORDER BY, but I do not have any distinct recollection of a problematic interaction between DISTINCT and ORDER BY. >>>> >>>> Bryan >>>> >>>> ---- >>>> Bryan Thompson >>>> Chief Scientist & Founder >>>> SYSTAP, LLC >>>> 4501 Tower Road >>>> Greensboro, NC 27410 >>>> br...@sy... >>>> http://bigdata.com >>>> http://mapgraph.io >>>> CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. >>>> >>>> >>>> >>>> On Wed, Nov 5, 2014 at 6:14 PM, Jim Balhoff <ba...@ne...> wrote: >>>>> On Nov 5, 2014, at 5:46 PM, Jeremy J Carroll <jj...@sy...> wrote: >>>>> >>>>> >>>>>> On Nov 5, 2014, at 1:02 PM, Bryan Thompson <br...@sy...> wrote: >>>>>> >>>>>> There could be an issue with ORDER BY operating on an anonymous and non-projected variable. Try declaring and binding a variable for STR(?label) inside of the query and then using that variable in the ORDER BY clause. >>>>> >>>>> >>>>> Yes I tend to find the results of ORDER BY are more what I expect if I do not include an expression in the ORDER BY but simply variables. I BIND any expression before the ORDER BY. >>>>> >>>>> I believe there is a trac item for this, but since the workaround is easy, I have never seen it as high priority >>>>> >>>> >>>> As suggested I tried binding a variable as `BIND (STR(?term_label) AS ?string_label)` and using that to sort. Still incorrect ordering. But, I tried removing DISTINCT, and then the ordering is correct. Even going back to the anonymous `ORDER BY STR(?term_label)`, ordering is still correct if I remove DISTINCT. For this specific query DISTINCT is not needed, but I do need it for my application. Is there a reason to not expect DISTINCT to work correctly with ORDER BY? >>>> >>>> Thanks both of you for all of your help, >>>> Jim >>>> >>>> >>> >>> >> >> > |
From: Bryan T. <br...@sy...> - 2014-11-06 22:22:10
|
*We will be introducing a new scale-out **architecture offering speedups of 100x to 10000x. * *This will be the fastest graph database processing platform anywhere.* Today, there are four main deployment models for bigdata. The last of these (the "bigdata federation") will be replaced by our new scale-out platform. 1. embedded 2. single server 3. highly available replication cluster 4. horizontally scaled database (aka "bigdata federation") <== this will be replaced. *We will continue to actively support and develop the following versions of the bigdata platform:* 1. embedded 2. single server 3. highly available replication cluster *Upcoming features for these platforms include:* - Support for openrdf 2.7 (this month) - Improved query optimization - Column-wise on the page - Faster data load times - Less data footprint on the disk. *Why MapGraph?* Our experience with MapGraph (http://mapgraph.io) has shown us how to create a new horizontally scaled platform that is 100x faster on CPUs and 10,000x faster on GPUs than the existing scale-out architecture. Therefore, we will be rolling out a new horizontally scaled database platform next year based on MapGraph and supporting both CPUs and GPUs for outrageous performance. If you want a preview, checkout our recent paper at IEEE Big Data. *What was wrong with the existing scale-out architecture?* The existing horizontally scaled architecture (aka the "bigdata federation") is based on dynamic sharding and was inspired by the Google bigtable architecture. The bigdata federation has several key innovations that go beyond the Google bigtable architecture and which provide significantly better performance than existing attempts to layer RDF/SPARQL over key-value store. For example, bigdata makes it possible to map the query over the data. This results in significantly less data read than other approaches such as RYA or CumulousRDF, etc. However, some aspects of the bigdata federation architecture have limited how quickly we can evolve the bigdata platform. In particular bigtable popularized the notion of a key-value store where the key is an unsigned byte[] and the value is a byte[]. However, modern high performance database design uses column-wise (or structure of arrays) layouts in order to minimize the memory bandwidth and CPU decode overhead associated with index operations. Dropping the bigdata federation architecture will allow us to quickly introduce column-wise storage and new query optimization techniques and greatly simplify the maintenance of the query engine. *Look for more news soon.* Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder SYSTAP, LLC 4501 Tower Road Greensboro, NC 27410 br...@sy... http://bigdata.com http://mapgraph.io CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. |
From: Jim B. <ba...@ne...> - 2014-11-06 17:22:35
|
I updated the query to use the simple variable in ORDER BY, and the results are the same. Here is the exact query (with or without DISTINCT) for the linked results: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> SELECT DISTINCT ?term ?string_label WHERE { ?term rdf:type owl:Class . ?term rdfs:label ?term_label . BIND (STR(?term_label) AS ?string_label) } ORDER BY ?string_label Results (same number of rows either way): SELECT DISTINCT: explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv SELECT: explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv Thanks, Jim > On Nov 6, 2014, at 12:01 PM, Bryan Thompson <br...@sy...> wrote: > > What happens if you replace that last line with: > > ORDER BY ?string_label > > rather than > > ORDER BY STR(?string_label) > > Remember, it is assuming that the ORDER BY is using simple variables. > > Bryan > > On Thu, Nov 6, 2014 at 11:58 AM, Jim Balhoff <ba...@ne...> wrote: > Here is the exact query (with or without DISTINCT) for the linked results: > > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > PREFIX owl: <http://www.w3.org/2002/07/owl#> > > SELECT DISTINCT ?term ?string_label > WHERE > { > ?term rdf:type owl:Class . > ?term rdfs:label ?term_label . > BIND (STR(?term_label) AS ?string_label) > } > ORDER BY STR(?string_label) > > > Results (same number of rows either way): > SELECT DISTINCT: > explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_explain.html > result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/with_distinct_result.csv > > SELECT: > explain: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_explain.html > result: https://dl.dropboxusercontent.com/u/6704325/bigdata/2014-11-6/no_distinct_result.csv > > You can diff the two results files to see the out-of-order blocks. > > I suppose it does look like the DISTINCT query plan has ORDER BY applied before DISTINCT, if I am reading it right. > > Thanks, > Jim > > > > > > On Nov 6, 2014, at 10:10 AM, Bryan Thompson <br...@sy...> wrote: > > > > Jim, > > > > 502 is about support for expressions (other than simple variables in ORDER_BY). > > > > If there is an issue with DISTINCT + ORDER_BY then this would be a new ticket. > > > > Just post the EXPLAIN (attach to the email) for the moment. I want to see how this is being generated. We should then check the specification and make sure that the correct behavior is DISTINCT followed by ORDER BY with any limit applied after the ORDER BY. I can then check the code for how we are handling this. > > > > The relevant logic is in AST2BOpUtility at line 451. You can see that it is already attempting to handle this and that there was a historical ticket for this issue (#563). > > > > > > > > /* > > > > * Note: The DISTINCT operators also enforce the projection. > > > > * > > > > * Note: REDUCED allows, but does not require, either complete or > > > > * partial filtering of duplicates. It is part of what openrdf does > > > > * for a DESCRIBE query. > > > > * > > > > * Note: We do not currently have special operator for REDUCED. One > > > > * could be created using chunk wise DISTINCT. Note that REDUCED may > > > > * not change the order in which the solutions appear (but we are > > > > * evaluating it before ORDER BY so that is Ok.) > > > > * > > > > * TODO If there is an ORDER BY and a DISTINCT then the sort can be > > > > * used to impose the distinct without the overhead of a hash index > > > > * by filtering out the duplicate solutions after the sort. > > > > */ > > > > > > > > // When true, DISTINCT must preserve ORDER BY ordering. > > > > final boolean preserveOrder; > > > > > > > > if (orderBy != null && !orderBy.isEmpty()) { > > > > > > > > /* > > > > * Note: ORDER BY before DISTINCT, so DISTINCT must preserve > > > > * order. > > > > * > > > > * @see https://sourceforge.net/apps/trac/bigdata/ticket/563 > > > > * (ORDER BY + DISTINCT) > > > > */ > > > > > > preserveOrder = true; > > > > > > > > left = addOrderBy(left, queryBase, orderBy, ctx); > > > > > > > > } else { > > > > > > preserveOrder = false; > > > > > > } > > > > > > > > if (projection.isDistinct() || projection.isReduced()) { > > > > > > > > left = addDistinct(left, queryBase, preserveOrder, ctx); > > > > > > > > } > > > > > > > > } else { > > > > > > > > /* > > > > * TODO Under what circumstances can the projection be [null]? > > > > */ > > > > > > if (orderBy != null && !orderBy.isEmpty()) { > > > > > > > > left = addOrderBy(left, queryBase, orderBy, ctx); > > > > > > > > } > > > > > > > > } > > > > > > > > Bryan > > > > > > ---- > > Bryan Thompson > > Chief Scientist & Founder > > SYSTAP, LLC > > 4501 Tower Road > > Greensboro, NC 27410 > > br...@sy... > > http://bigdata.com > > http://mapgraph.io > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > > > > > On Thu, Nov 6, 2014 at 10:03 AM, Jim Balhoff <ba...@ne...> wrote: > > Hi Bryan, > > > > Just to clarify, would you like me to attach the info to ticket 502, or continue posting to the developer list? > > > > Thanks, > > Jim > > > > > > > On Nov 6, 2014, at 8:28 AM, Bryan Thompson <br...@sy...> wrote: > > > > > > The ticket for allowing aggregates in ORDER BY is: > > > > > > - http://trac.bigdata.com/ticket/502 (Allow aggregates in ORDER BY clause) > > > > > > Can you attach the EXPLAIN of the query with and without DISTINCT. The issue may be that the DISTINCT is being applied after the ORDER BY. I seem to remember some issue historically with operations being performed before/after the ORDER BY, but I do not have any distinct recollection of a problematic interaction between DISTINCT and ORDER BY. > > > > > > Bryan > > > > > > ---- > > > Bryan Thompson > > > Chief Scientist & Founder > > > SYSTAP, LLC > > > 4501 Tower Road > > > Greensboro, NC 27410 > > > br...@sy... > > > http://bigdata.com > > > http://mapgraph.io > > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. > > > > > > > > > > > > On Wed, Nov 5, 2014 at 6:14 PM, Jim Balhoff <ba...@ne...> wrote: > > > > On Nov 5, 2014, at 5:46 PM, Jeremy J Carroll <jj...@sy...> wrote: > > > > > > > > > > > >> On Nov 5, 2014, at 1:02 PM, Bryan Thompson <br...@sy...> wrote: > > > >> > > > >> There could be an issue with ORDER BY operating on an anonymous and non-projected variable. Try declaring and binding a variable for STR(?label) inside of the query and then using that variable in the ORDER BY clause. > > > > > > > > > > > > Yes I tend to find the results of ORDER BY are more what I expect if I do not include an expression in the ORDER BY but simply variables. I BIND any expression before the ORDER BY. > > > > > > > > I believe there is a trac item for this, but since the workaround is easy, I have never seen it as high priority > > > > > > > > > > As suggested I tried binding a variable as `BIND (STR(?term_label) AS ?string_label)` and using that to sort. Still incorrect ordering. But, I tried removing DISTINCT, and then the ordering is correct. Even going back to the anonymous `ORDER BY STR(?term_label)`, ordering is still correct if I remove DISTINCT. For this specific query DISTINCT is not needed, but I do need it for my application. Is there a reason to not expect DISTINCT to work correctly with ORDER BY? > > > > > > Thanks both of you for all of your help, > > > Jim > > > > > > > > > > > > |