This list is closed, nobody may subscribe to it.
2010 |
Jan
|
Feb
(19) |
Mar
(8) |
Apr
(25) |
May
(16) |
Jun
(77) |
Jul
(131) |
Aug
(76) |
Sep
(30) |
Oct
(7) |
Nov
(3) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(16) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(7) |
2012 |
Jan
(10) |
Feb
(1) |
Mar
(8) |
Apr
(6) |
May
(1) |
Jun
(3) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(8) |
Dec
(2) |
2013 |
Jan
(5) |
Feb
(12) |
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(22) |
Aug
(50) |
Sep
(31) |
Oct
(64) |
Nov
(83) |
Dec
(28) |
2014 |
Jan
(31) |
Feb
(18) |
Mar
(27) |
Apr
(39) |
May
(45) |
Jun
(15) |
Jul
(6) |
Aug
(27) |
Sep
(6) |
Oct
(67) |
Nov
(70) |
Dec
(1) |
2015 |
Jan
(3) |
Feb
(18) |
Mar
(22) |
Apr
(121) |
May
(42) |
Jun
(17) |
Jul
(8) |
Aug
(11) |
Sep
(26) |
Oct
(15) |
Nov
(66) |
Dec
(38) |
2016 |
Jan
(14) |
Feb
(59) |
Mar
(28) |
Apr
(44) |
May
(21) |
Jun
(12) |
Jul
(9) |
Aug
(11) |
Sep
(4) |
Oct
(2) |
Nov
(1) |
Dec
|
2017 |
Jan
(20) |
Feb
(7) |
Mar
(4) |
Apr
(18) |
May
(7) |
Jun
(3) |
Jul
(13) |
Aug
(2) |
Sep
(4) |
Oct
(9) |
Nov
(2) |
Dec
(5) |
2018 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Bryan T. <br...@bl...> - 2016-04-18 19:32:15
|
Use sparql SERVICE. On Apr 18, 2016 3:26 PM, "Joakim Soderberg" <joa...@bl...> wrote: > Having two namespaces in triple mode, is it possible to query both from > within one Sparql query? > Say I am using > > RepositoryConnection con = repo.getConnection(); > TupleQuery tupleQuery = > con.prepareTupleQuery(QueryLanguage.SPARQL, query); > > Has anyone tried using > SELECT * > FROM <url1> > FROM <url2 > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications > Manager > Applications Manager provides deep performance insights into multiple > tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Joakim S. <joa...@bl...> - 2016-04-18 19:26:29
|
Having two namespaces in triple mode, is it possible to query both from within one Sparql query? Say I am using RepositoryConnection con = repo.getConnection(); TupleQuery tupleQuery = con.prepareTupleQuery(QueryLanguage.SPARQL, query); Has anyone tried using SELECT * FROM <url1> FROM <url2 > |
From: Michael S. <ms...@me...> - 2016-04-15 12:44:59
|
Glad I could help — you’re right, I’ve created https://jira.blazegraph.com/browse/BLZG-1886 <https://jira.blazegraph.com/browse/BLZG-1886> as a follow-up to improve the usage message with all available options. Best, Michael > On 15 Apr 2016, at 14:39, Daniel Henández <da...@de...> wrote: > > El 15/04/16 a las 09:27, Michael Schmidt escribió: >> Hi Daniel, >> >> please try >> >> > java -cp blazegraph.jar com.bigdata.rdf.store.DataLoader -defaultGraph <http://example.org/>http://example.org <http://example.org/> server.properties data.nq >> > > It works! The usage message did not include the option -defaultGraph. I could not have solved this without your help. > > Thanks, > Daniel > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications Manager > Applications Manager provides deep performance insights into multiple tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z_______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
From: Daniel H. <da...@de...> - 2016-04-15 12:39:19
|
El 15/04/16 a las 09:27, Michael Schmidt escribió: > Hi Daniel, > > please try > > > java -cp blazegraph.jar com.bigdata.rdf.store.DataLoader > -defaultGraph http://example.org server.properties data.nq > It works! The usage message did not include the option -defaultGraph. I could not have solved this without your help. Thanks, Daniel |
From: Michael S. <ms...@me...> - 2016-04-15 12:27:30
|
Hi Daniel, please try > java -cp blazegraph.jar com.bigdata.rdf.store.DataLoader -defaultGraph http://example.org server.properties data.nq Best, Michael > On 15 Apr 2016, at 13:41, Daniel Henández <da...@de...> wrote: > > >> Dear Daniel, >> >> afaik you need to pass the defaultGraph parameter as a command line argument to the DataLoader call (rather than providing it inside the properties file). Could you please try that? > I am not sure where I have to put this option, so I tried with all > possible ways: > > $ java -cp blazegraph.jar com.bigdata.rdf.store.DataLoader > -DdefaultGraph=http://example.org server.properties data.nq > Unknown argument: -DdefaultGraph=http://example.org > usage: [-closure][-verbose][-durableQueues][-namespace namespace] > propertyFile (fileOrDir)+ > > $ java -cp blazegraph.jar -DdefaultGraph=http://example.org > com.bigdata.rdf.store.DataLoader server.properties data.nq > => context error > > $ java -DdefaultGraph=http://example.org -cp blazegraph.jar > com.bigdata.rdf.store.DataLoader -DdefaultGraph=http://example.org > server.properties data.nq > => context error > > Thanks, > Daniel > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications Manager > Applications Manager provides deep performance insights into multiple tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
From: Daniel H. <da...@de...> - 2016-04-15 11:41:22
|
> Dear Daniel, > > afaik you need to pass the defaultGraph parameter as a command line argument to the DataLoader call (rather than providing it inside the properties file). Could you please try that? I am not sure where I have to put this option, so I tried with all possible ways: $ java -cp blazegraph.jar com.bigdata.rdf.store.DataLoader -DdefaultGraph=http://example.org server.properties data.nq Unknown argument: -DdefaultGraph=http://example.org usage: [-closure][-verbose][-durableQueues][-namespace namespace] propertyFile (fileOrDir)+ $ java -cp blazegraph.jar -DdefaultGraph=http://example.org com.bigdata.rdf.store.DataLoader server.properties data.nq => context error $ java -DdefaultGraph=http://example.org -cp blazegraph.jar com.bigdata.rdf.store.DataLoader -DdefaultGraph=http://example.org server.properties data.nq => context error Thanks, Daniel |
From: Michael S. <ms...@me...> - 2016-04-15 07:21:19
|
Dear Daniel, afaik you need to pass the defaultGraph parameter as a command line argument to the DataLoader call (rather than providing it inside the properties file). Could you please try that? Best, Michael > On 15 Apr 2016, at 03:37, Daniel Henández <da...@de...> wrote: > > >> Daniel, >> >> Can you post your configuration file for the bulk loader? > > Brad, I put the steps and files to reproduce my problem in: > > https://daniel.degu.cl/en/issues/issue-0001 > > Thanks, > Daniel > > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications Manager > Applications Manager provides deep performance insights into multiple tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
From: Daniel H. <da...@de...> - 2016-04-15 01:38:05
|
> Daniel, > > Can you post your configuration file for the bulk loader? Brad, I put the steps and files to reproduce my problem in: https://daniel.degu.cl/en/issues/issue-0001 Thanks, Daniel |
From: Brad B. <be...@bl...> - 2016-04-15 01:00:30
|
Daniel, Can you post your configuration file for the bulk loader? Thanks, --Brad On Thu, Apr 14, 2016 at 6:03 PM, Daniel Henández <da...@de...> wrote: > > El 14/04/16 a las 17:32, Brad Bebee escribió: > > > > Daniel, > > > > I believe you need < > around the URI: <http://example.org>. > > > > Let us know how it goes. > > > Brad, I tried with the line > > defaultGraph=<http://example.org> > > in the properties file, but I got the same error: "context not bound". > > Thanks, > Daniel > > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications > Manager > Applications Manager provides deep performance insights into multiple > tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > -- _______________ Brad Bebee CEO Blazegraph e: be...@bl... m: 202.642.7961 w: www.blazegraph.com Blazegraph products help to solve the Graph Cache Thrash to achieve large scale processing for graph and predictive analytics. Blazegraph is the creator of the industry’s first GPU-accelerated high-performance database for large graphs, has been named as one of the “10 Companies and Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. Blazegraph Database <https://www.blazegraph.com/> is our ultra-high performance graph database that supports both RDF/SPARQL and Apache TinkerPop™ APIs. Blazegraph GPU <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new technologies that use GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. |
From: Daniel H. <da...@de...> - 2016-04-14 22:03:58
|
El 14/04/16 a las 17:32, Brad Bebee escribió: > > Daniel, > > I believe you need < > around the URI: <http://example.org>. > > Let us know how it goes. > Brad, I tried with the line defaultGraph=<http://example.org> in the properties file, but I got the same error: "context not bound". Thanks, Daniel |
From: Brad B. <be...@bl...> - 2016-04-14 20:32:33
|
Daniel, I believe you need < > around the URI: <http://example.org>. Let us know how it goes. Thanks, Brad _______________ Brad Bebee CEO, Managing Partner SYSTAP, LLC e: be...@sy... m: 202.642.7961 f: 571.367.5000 w: www.systap.com Blazegraph™ is our ultra high-performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. MapGraph™ is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Apr 14, 2016 4:24 PM, "Daniel Henández" <da...@de...> wrote: > Hi, > > I am trying to load quads into Blazegraph 2.1.0 using the bulk loader. > > java -cp blazegraph.jar com.bigdata.rdf.store.DataLoader > config.properties data.nq > > The documentation [1] said that I have to set the defaultGraph property > when > loading quads (because some statements are triples). Thus, I added > the line > > defaultGraph=http://example.org > > to the properties file. However, it continues producing a context error. > I think > that I'm not doing this properly. So, how can I specify the defaultGraph? > > (I first asked in Stackoverflow [2]. I guess that this mailing list has > more > readers.) > > [1]: > > https://wiki.blazegraph.com/wiki/index.php/REST_API#Context_Not_Bound_Error_.28Quads_mode_without_defaultGraph.29 > [2]: > > http://stackoverflow.com/questions/36629199/how-to-load-quads-using-the-bulk-loader > > Cheers, > Daniel > > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications > Manager > Applications Manager provides deep performance insights into multiple > tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
From: Daniel H. <da...@de...> - 2016-04-14 20:24:29
|
Hi, I am trying to load quads into Blazegraph 2.1.0 using the bulk loader. java -cp blazegraph.jar com.bigdata.rdf.store.DataLoader config.properties data.nq The documentation [1] said that I have to set the defaultGraph property when loading quads (because some statements are triples). Thus, I added the line defaultGraph=http://example.org to the properties file. However, it continues producing a context error. I think that I'm not doing this properly. So, how can I specify the defaultGraph? (I first asked in Stackoverflow [2]. I guess that this mailing list has more readers.) [1]: https://wiki.blazegraph.com/wiki/index.php/REST_API#Context_Not_Bound_Error_.28Quads_mode_without_defaultGraph.29 [2]: http://stackoverflow.com/questions/36629199/how-to-load-quads-using-the-bulk-loader Cheers, Daniel |
From: Brad B. <be...@bl...> - 2016-04-14 16:53:22
|
Andreas, As an addition to the DumpJournal technique, which can give you a sense of the actual inlining performance for load with instance data. You can also validate your vocabulary in a unit test. If you look at TestPubChemVocabInlineUris.java, you will see an example of creating a unit test that creates a namespace using a custom vocabulary and inline URI Handler then validates that the intended URIs are being inlined. Our recommendation would be to first build your vocabulary and use the unit test, then load the data and use DumpJournal to see if there may be additional inlining opportunities. There is also a "latent" new feature in the com.bigdata.rdf.util.VocabBuilder [2], which Michael updated as part of 2.1.0. You can run this over your instance data as "java -cp blazegraph.jar com.bigdata.rdf.util.VocabBuilder /path/to/fileordir /path/to/fileordir ...". It will then generate a Java file containing a custom vocabulary starting point. We need to add parallelized reads to this for shorter processing on large data sets, but it will generate a Vocabulary with inlined URIs for the highest frequency URIs in your data. You can then augment this with a custom inline URI handler as you have done. We have a backlogged Blog post / Wiki Update on this feature. Thanks, --Brad [1] https://github.com/blazegraph/database/blob/master/vocabularies/src/test/java/com/blazegraph/vocab/pubchem/TestPubchemVocabInlineUris.java [2] https://github.com/blazegraph/database/blob/master/bigdata-core/bigdata-rdf/src/java/com/bigdata/rdf/util/VocabBuilder.java On Thu, Apr 14, 2016 at 9:46 AM, Andreas Kahl <ka...@bs...> wrote: > Bryan, > > Thanks for the info. DumpJournal (with -pages because I want to tune > branching factors) is already running, but it will take some time as the > journal is 82GB. > As soon as I have the Html-output I will have a look at the numbers you > mentioned. > > Best Regards > Andreas > >>> Bryan Thompson <br...@sy...> 14.04.2016 15:28 >>> > Use DumpJournal (w/o -pages). Look at the number of entries in the TERM2ID > and BLOBS indices. This will tell you how many RDF Values were NOT inlined. > > If you want to figure out how many were inlined, look at the number of > statements in one of the statement indices. Multiple by 3 (or 4 for quads) > and then subtract the number of entries in (TERM2ID + BLOBS). That is the > number of inline IVs. > > You are probably after the distinct number of non-inlined IVs. This is not > so easy to find. However, just the size of the TERM2ID and BLOBS indices is > a very good indication of whether or not things are being inlined. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > Blazegraph > e: br...@bl... > w: http://blazegraph.com > > Blazegraph products help to solve the Graph Cache Thrash to achieve large > scale processing for graph and predictive analytics. Blazegraph is the > creator of the industry’s first GPU-accelerated high-performance database > for large graphs, has been named as one of the “10 Companies and > Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. > > > Blazegraph Database <https://www.blazegraph.com/> is our ultra-high > performance graph database that supports both RDF/SPARQL and > Tinkerpop/Blueprints APIs. Blazegraph GPU > <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS > <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new > technologies that use GPUs to enable extreme scaling that is thousands of > times faster and 40 times more affordable than CPU-based solutions. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, > disclosure, dissemination or copying of this email or its contents or > attachments is prohibited. If you have received this communication in > error, please notify the sender by reply email and permanently delete all > copies of the email and its contents and attachments. > > On Thu, Apr 14, 2016 at 9:25 AM, Andreas Kahl <ka...@bs...> > wrote: > >> Hello everyone, >> >> how can I determine which portion of URIs in my journal were successfully >> inlined? >> >> From your example PubChem I derived my own InlineUriFactory (attached). >> This is the config used: >> <entry >> key="com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass">de.bsb_muenchen.bigdata.vocab.B3KatVocabulary</entry> >> >> <entry >> key="com.bigdata.rdf.store.AbstractTripleStore.inlineURIFactory">de.bsb_muenchen.bigdata.vocab.B3KatInlineUriFactory</entry> >> >> All mentioned classes are attached. >> >> Thanks & Best Regards >> Andreas >> >> >> ------------------------------------------------------------------------------ >> Find and fix application performance issues faster with Applications >> Manager >> Applications Manager provides deep performance insights into multiple >> tiers of >> your business applications. It resolves application problems quickly and >> reduces your MTTR. Get your free trial! >> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >> >> > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications > Manager > Applications Manager provides deep performance insights into multiple > tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > -- _______________ Brad Bebee CEO Blazegraph e: be...@bl... m: 202.642.7961 w: www.blazegraph.com Blazegraph products help to solve the Graph Cache Thrash to achieve large scale processing for graph and predictive analytics. Blazegraph is the creator of the industry’s first GPU-accelerated high-performance database for large graphs, has been named as one of the “10 Companies and Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. Blazegraph Database <https://www.blazegraph.com/> is our ultra-high performance graph database that supports both RDF/SPARQL and Apache TinkerPop™ APIs. Blazegraph GPU <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new technologies that use GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. |
From: Bryan T. <br...@sy...> - 2016-04-14 13:28:29
|
Use DumpJournal (w/o -pages). Look at the number of entries in the TERM2ID and BLOBS indices. This will tell you how many RDF Values were NOT inlined. If you want to figure out how many were inlined, look at the number of statements in one of the statement indices. Multiple by 3 (or 4 for quads) and then subtract the number of entries in (TERM2ID + BLOBS). That is the number of inline IVs. You are probably after the distinct number of non-inlined IVs. This is not so easy to find. However, just the size of the TERM2ID and BLOBS indices is a very good indication of whether or not things are being inlined. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder Blazegraph e: br...@bl... w: http://blazegraph.com Blazegraph products help to solve the Graph Cache Thrash to achieve large scale processing for graph and predictive analytics. Blazegraph is the creator of the industry’s first GPU-accelerated high-performance database for large graphs, has been named as one of the “10 Companies and Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. Blazegraph Database <https://www.blazegraph.com/> is our ultra-high performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Blazegraph GPU <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new technologies that use GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Apr 14, 2016 at 9:25 AM, Andreas Kahl <ka...@bs...> wrote: > Hello everyone, > > how can I determine which portion of URIs in my journal were successfully > inlined? > > From your example PubChem I derived my own InlineUriFactory (attached). > This is the config used: > <entry > key="com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass">de.bsb_muenchen.bigdata.vocab.B3KatVocabulary</entry> > > <entry > key="com.bigdata.rdf.store.AbstractTripleStore.inlineURIFactory">de.bsb_muenchen.bigdata.vocab.B3KatInlineUriFactory</entry> > > All mentioned classes are attached. > > Thanks & Best Regards > Andreas > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications > Manager > Applications Manager provides deep performance insights into multiple > tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
From: Andreas K. <ka...@bs...> - 2016-04-14 13:25:18
|
package de.bsb_muenchen.bigdata.vocab; import com.bigdata.rdf.vocab.VocabularyDecl; import java.util.Arrays; import java.util.Collections; import java.util.Iterator; import org.openrdf.model.URI; import org.openrdf.model.impl.URIImpl; /** * This does not contain the complete RDVocab-Elements Ontology. It exclusively declares the * URIs most commonly used in the http://lod.b3kat.de-dataset * * @author Andreas Kahl<ka...@bs...> */ public class B3KatVocabularyDecl implements VocabularyDecl { static private final URI[] URIS = new URI[]{ new URIImpl("http://lod.b3kat.de/title/"), //Secondary namespaces for URI Inlining new URIImpl("http://lod.b3kat.de/bib/"), new URIImpl("http://lod.b3kat.de/isbn/"), new URIImpl("http://d-nb.info/gnd/"), //RISM-URIs new URIImpl("http://data.rism.info/id/rismid/"), new URIImpl("http://data.rism.info/id/rismauthorities/") }; @Override public Iterator<URI> values() { return Collections.unmodifiableList(Arrays.asList(URIS)).iterator(); } } |
From: Bryan T. <br...@sy...> - 2016-04-11 18:10:58
|
I think that this is just an integration question. it was released initially as a separate artifact. It probably needs to be bundled to make this work. Or "hooked" as a lazy integration component. I will discuss some possible approaches with Brad and Olaf. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder Blazegraph e: br...@bl... w: http://blazegraph.com Blazegraph products help to solve the Graph Cache Thrash to achieve large scale processing for graph and predictive analytics. Blazegraph is the creator of the industry’s first GPU-accelerated high-performance database for large graphs, has been named as one of the “10 Companies and Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. Blazegraph Database <https://www.blazegraph.com/> is our ultra-high performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Blazegraph GPU <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new technologies that use GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Mon, Apr 11, 2016 at 2:04 PM, Stas Malyshev <sma...@wi...> wrote: > Hi! > > > This is because the journal can not be opened from two separate > > processes at the same time. I can discuss this with Olaf. > > It'd be nice if the LDF server could access the store from running > server, since it'd allow to serve the same data set in both modes. If I > understand correctly, LDF server is read-only, so it should not have > requirements over what parallel access within Blazegraph instance would > require? Or is it more complicated? > > Thanks, > -- > Stas Malyshev > sma...@wi... > |
From: Stas M. <sma...@wi...> - 2016-04-11 18:04:31
|
Hi! > This is because the journal can not be opened from two separate > processes at the same time. I can discuss this with Olaf. It'd be nice if the LDF server could access the store from running server, since it'd allow to serve the same data set in both modes. If I understand correctly, LDF server is read-only, so it should not have requirements over what parallel access within Blazegraph instance would require? Or is it more complicated? Thanks, -- Stas Malyshev sma...@wi... |
From: Bryan T. <br...@sy...> - 2016-04-11 10:36:53
|
This is because the journal can not be opened from two separate processes at the same time. I can discuss this with Olaf. Thanks, Bryan On Monday, April 11, 2016, Blaise de Carné <bde...@gm...> wrote: > Hi there, > > It works when the Blazegraph server is not running. We can't get it work > when the NanoSparqlServer is running, we get this error : > > org.eclipse.jetty.servlet.ServletHolder$1: org.linkeddatafragments.exceptions.DataSourceCreationException: java.lang.RuntimeException: file=blazegraph.jnl > > Best, > Blaise > > 2016-04-10 17:07 GMT+02:00 Bryan Thompson <br...@sy... > <javascript:_e(%7B%7D,'cvml','br...@sy...');>>: > >> Blaise, >> >> Please confirm that you can simply reconfigure to access an existing >> Journal file. This should work. >> >> Thanks, >> Bryan >> >> ---- >> Bryan Thompson >> Chief Scientist & Founder >> Blazegraph >> e: br...@bl... >> <javascript:_e(%7B%7D,'cvml','br...@bl...');> >> w: http://blazegraph.com >> >> Blazegraph products help to solve the Graph Cache Thrash to achieve large >> scale processing for graph and predictive analytics. Blazegraph is the >> creator of the industry’s first GPU-accelerated high-performance database >> for large graphs, has been named as one of the “10 Companies and >> Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. >> >> >> Blazegraph Database <https://www.blazegraph.com/> is our ultra-high >> performance graph database that supports both RDF/SPARQL and >> Tinkerpop/Blueprints APIs. Blazegraph GPU >> <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS >> <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive >> new technologies that use GPUs to enable extreme scaling that is thousands >> of times faster and 40 times more affordable than CPU-based solutions. >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, >> disclosure, dissemination or copying of this email or its contents or >> attachments is prohibited. If you have received this communication in >> error, please notify the sender by reply email and permanently delete all >> copies of the email and its contents and attachments. >> >> On Sat, Apr 9, 2016 at 1:02 AM, Olaf Hartig <oh...@uw... >> <javascript:_e(%7B%7D,'cvml','oh...@uw...');>> wrote: >> >>> Hi Braise, >>> >>> I think you can do it. Although I have not tested this use case, I do >>> not see why it would not be possible. Just point the config.json to the >>> journal file. >>> >>> Best, >>> Olaf >>> >>> >>> On April 9, 2016 12:41:37 AM GMT+02:00, "Blaise de Carné" < >>> bde...@gm... <javascript:_e(%7B%7D,'cvml','bde...@gm...');>> >>> wrote: >>>> >>>> Hi Olaf, >>>> >>>> Yes, we already took a look on your implementation. It looks good, but >>>> we can't use it on a journal that is already used for the SPARQL Endpoint, >>>> am i wrong ? >>>> >>>> Blaise >>>> >>>> Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw... >>>> <javascript:_e(%7B%7D,'cvml','oh...@uw...');>> a écrit : >>>> >>>>> Dear Blaise, >>>>> >>>>> As Michael mentioned, I implemented a TPF interface directly on top of >>>>> Blazegraph. This implementation uses directly the Blazegraph internals >>>>> and, >>>>> thus, avoids the overhead of forwarding every TPF request to the SPARQL >>>>> endpoint interface (as would be done by using the standard TPF server >>>>> implementation). >>>>> >>>>> Find the original source code here: >>>>> >>>>> https://github.com/hartig/BlazegraphBasedTPFServer >>>>> >>>>> ...and note that this TPF interface is included in the official 2.0 >>>>> release of >>>>> Blazegraph: >>>>> >>>>> http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 >>>>> <http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> >>>>> >>>>> Cheers, >>>>> Olaf >>>>> >>>>> >>>>> >>>>> On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: >>>>> > In response to the request from the bigdata-commit (see below), >>>>> please let’s >>>>> > resume the discussion on this place: >>>>> > >>>>> > Determinism is not guaranteed unless parallelism is explicitly >>>>> disabled — >>>>> > this even holds for select queries. There are several potential >>>>> sources for >>>>> > non-determinism: in the general case, Blazegraph may choose to run >>>>> multiple >>>>> > parallel threads for a given operator (processing different chunks >>>>> of data >>>>> > in parallel), and in some cases operators also use multiple threads >>>>> > internally. >>>>> > >>>>> > For the given query at hand, the single triple pattern access path >>>>> will >>>>> > yield results in order, but this order actually might be destroyed >>>>> by other >>>>> > operators on top. The projection operator, for instance, does not >>>>> guarantee >>>>> > order in the general case, as it might process data in different >>>>> threads. >>>>> > The way to achieve determinism would be to explicitly disable this >>>>> > parallelism. In fact, this is what Blazegraph is doing when >>>>> projecting for >>>>> > queries that have an ORDER BY clause. Code-wise, a good starting >>>>> point is >>>>> > in AST2BOpUtility, starting at line 579: >>>>> > >>>>> > <snip> >>>>> > if (projection != null) { >>>>> > >>>>> > /** >>>>> > * The projection after the ORDER BY needs to >>>>> preserve the ordering. >>>>> > * So does the chunked materialization >>>>> operator. The code above >>>>> > * handles this for ORDER_BY + DISTINCT, but >>>>> does not go far enough >>>>> > * to impose order preserving evaluation on >>>>> the PROJECTION and >>>>> > * chunked materialization, both of which are >>>>> downstream from the >>>>> > * ORDER_BY operator. >>>>> > * >>>>> > * @see #1044 (PROJECTION after ORDER BY does >>>>> not preserve order) >>>>> > */ >>>>> > final boolean preserveOrder = orderBy != null; >>>>> > >>>>> > /* >>>>> > * Append operator to drop variables which are not >>>>> projected by >>>>> > the * subquery. >>>>> > * >>>>> > * Note: We need to retain all variables which were >>>>> visible in >>>>> > the * parent group plus anything which was projected out of the * >>>>> subquery. >>>>> > Since there can be exogenous variables, the easiest way * to do this >>>>> > correctly is to drop variables from the subquery plan * which are not >>>>> > projected by the subquery. (This is not done at the * top-level >>>>> query plan >>>>> > because it would cause exogenous variables * to be dropped.) >>>>> > */ >>>>> > >>>>> > { >>>>> > // The variables projected by the >>>>> subquery. >>>>> > final IVariable<?>[] projectedVars = >>>>> projection >>>>> > .getProjectionVars(); >>>>> > >>>>> > final List<NV> anns = new >>>>> LinkedList<NV>(); >>>>> > anns.add(new >>>>> NV(BOp.Annotations.BOP_ID, ctx.nextId())); >>>>> > anns.add(new >>>>> NV(BOp.Annotations.EVALUATION_CONTEXT, >>>>> > BOpEvaluationContext.CONTROLLER)); anns.add(new >>>>> > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats >>>>> anns.add(new >>>>> > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if >>>>> (preserveOrder) { >>>>> > /** >>>>> > * @see #563 (ORDER BY + >>>>> DISTINCT) >>>>> > * @see #1044 (PROJECTION >>>>> after ORDER BY does not preserve >>>>> > * order) >>>>> > */ >>>>> > anns.add(new >>>>> NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); >>>>> > anns.add(new >>>>> NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); >>>>> > } >>>>> > left = applyQueryHints(new >>>>> ProjectionOp(leftOrEmpty(left),// >>>>> > anns.toArray(new >>>>> NV[anns.size()])// >>>>> > ), queryBase, ctx); >>>>> > } >>>>> > </snip> >>>>> > >>>>> > If the preserve order flag is true, parallelism for the operator is >>>>> > explicitly disabled. Disabling parallelism for the projection node >>>>> would >>>>> > help for simple queries such as single triple pattern, but in the >>>>> general >>>>> > case (for more complex queries) there will be other operators that >>>>> might >>>>> > cause non-deterministic behaviour. >>>>> > >>>>> > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on >>>>> top of >>>>> > Blazegraph, adding him in CC. >>>>> > >>>>> > >>>>> > Best, >>>>> > Michael >>>>> > >>>>> > > From: Blaise de Carné <bde...@gm... >>>>> <javascript:_e(%7B%7D,'cvml','bde...@gm...');>> >>>>> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY >>>>> > > Date: 8 April 2016 at 10:58:02 GMT+2 >>>>> > > To: "big...@li... >>>>> <javascript:_e(%7B%7D,'cvml','big...@li...');> >>>>> " >>>>> > > <big...@li... >>>>> <javascript:_e(%7B%7D,'cvml','big...@li...');> >>>>> > >>>>> > > >>>>> > > Hi there, >>>>> > > >>>>> > > I would like to expose a considiration that I find very annoying. >>>>> I need >>>>> > > to do more tests but i would like to know your fellings about it. >>>>> > > >>>>> > > Look for this exemple : >>>>> > > >>>>> > > construct where { >>>>> > > >>>>> > > ?s <http://geovocab.org/geometry#geometry >>>>> > > <http://geovocab.org/geometry#geometry>> ?event> >>>>> > > } limit 5 >>>>> > > >>>>> > > It take avout 100ms to execute on my 3B dataset. >>>>> > > >>>>> > > In 90% of time, this give me 5 results in the same order : >>>>> > > >>>>> > > <http://linkedgeodata.org/triplify/node1003406722> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1003406722>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1003 >>>>> > > 406722> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1003406722>> >>>>> > > <http://linkedgeodata.org/triplify/node1003749425> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1003749425>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1003 >>>>> > > 749425> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1003749425>> >>>>> > > <http://linkedgeodata.org/triplify/node1011261499> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1011261499>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1011 >>>>> > > 261499> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1011261499>> >>>>> > > <http://linkedgeodata.org/triplify/node1011261514> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1011261514>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1011 >>>>> > > 261514> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1011261514>> >>>>> > > <http://linkedgeodata.org/triplify/node1011286717> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1011286717>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1011 >>>>> > > 286717> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get >>>>> differents >>>>> > > results : >>>>> > > >>>>> > > <http://linkedgeodata.org/triplify/node1204787784> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1204787784>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1204 >>>>> > > 787784> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1204787784>> >>>>> > > <http://linkedgeodata.org/triplify/node1206798938> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1206798938>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1206 >>>>> > > 798938> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1206798938>> >>>>> > > <http://linkedgeodata.org/triplify/node12081506> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node12081506>> >>>>> <http://geovocab.org/geometry#geom >>>>> > > etry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1208 >>>>> > > 1506> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node12081506>> >>>>> > > <http://linkedgeodata.org/triplify/node1209197022> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1209197022>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1209 >>>>> > > 197022> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1209197022>> >>>>> > > <http://linkedgeodata.org/triplify/node1212230478> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1212230478>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1212 >>>>> > > 230478> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1212230478>> >>>>> > > >>>>> > > Conclusion : order is not garantee without ORDER BY. If i use an >>>>> ORDER BY, >>>>> > > performance drop alarmingly. >>>>> > > >>>>> > > Now take this fabulous project : Linked Data Fragments >>>>> > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/ >>>>> >), >>>>> > > which provide a SparqlDatasource to handle data from a SPARQL >>>>> Endpoint. >>>>> > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the >>>>> results, >>>>> > > as they says in the comments : >>>>> > > >>>>> > > // Even though the SPARQL spec indicates that >>>>> > > // LIMIT and OFFSET might be meaningless without ORDER BY, >>>>> > > // this doesn't seem a problem in practice. >>>>> > > // Furthermore, sorting can be slow. Therefore, don't sort. >>>>> > > >>>>> > > But it's a problem in practice with Blazegraph, and i >>>>> exeperimented it : a >>>>> > > Linked Data Fragments server configured over a Blazegraph SPARQL >>>>> Endpoint >>>>> > > serve different pages in 5-10% of time. >>>>> > > >>>>> > > In our project we really need to get consistent pagination, >>>>> without ORDER >>>>> > > BY. Do you think that is possible with Blazegraph ? >>>>> > > >>>>> > > Bests, >>>>> > > Blaise >>>>> > > >>>>> > > PS : i don't see this behaviour with SELECT, but cache could be >>>>> > > responsible... >>>>> >>>> >>> -- >>> Sent from my Android device with K-9 Mail. Please excuse my brevity. >>> >>> >>> ------------------------------------------------------------------------------ >>> Find and fix application performance issues faster with Applications >>> Manager >>> Applications Manager provides deep performance insights into multiple >>> tiers of >>> your business applications. It resolves application problems quickly and >>> reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/ >>> gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532 >>> <http://pubads.g.doubleclick.net/gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532> >>> _______________________________________________ >>> Bigdata-developers mailing list >>> Big...@li... >>> <javascript:_e(%7B%7D,'cvml','Big...@li...');> >>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>> >> -- ---- Bryan Thompson Chief Scientist & Founder Blazegraph e: br...@bl... w: http://blazegraph.com Blazegraph products help to solve the Graph Cache Thrash to achieve large scale processing for graph and predictive analytics. Blazegraph is the creator of the industry’s first GPU-accelerated high-performance database for large graphs, has been named as one of the “10 Companies and Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. Blazegraph Database <https://www.blazegraph.com/> is our ultra-high performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Blazegraph GPU <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new technologies that use GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. |
From: Blaise de C. <bde...@gm...> - 2016-04-11 09:21:59
|
Hi there, It works when the Blazegraph server is not running. We can't get it work when the NanoSparqlServer is running, we get this error : org.eclipse.jetty.servlet.ServletHolder$1: org.linkeddatafragments.exceptions.DataSourceCreationException: java.lang.RuntimeException: file=blazegraph.jnl Best, Blaise 2016-04-10 17:07 GMT+02:00 Bryan Thompson <br...@sy...>: > Blaise, > > Please confirm that you can simply reconfigure to access an existing > Journal file. This should work. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > Blazegraph > e: br...@bl... > w: http://blazegraph.com > > Blazegraph products help to solve the Graph Cache Thrash to achieve large > scale processing for graph and predictive analytics. Blazegraph is the > creator of the industry’s first GPU-accelerated high-performance database > for large graphs, has been named as one of the “10 Companies and > Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. > > > Blazegraph Database <https://www.blazegraph.com/> is our ultra-high > performance graph database that supports both RDF/SPARQL and > Tinkerpop/Blueprints APIs. Blazegraph GPU > <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS > <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new > technologies that use GPUs to enable extreme scaling that is thousands of > times faster and 40 times more affordable than CPU-based solutions. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, > disclosure, dissemination or copying of this email or its contents or > attachments is prohibited. If you have received this communication in > error, please notify the sender by reply email and permanently delete all > copies of the email and its contents and attachments. > > On Sat, Apr 9, 2016 at 1:02 AM, Olaf Hartig <oh...@uw...> wrote: > >> Hi Braise, >> >> I think you can do it. Although I have not tested this use case, I do not >> see why it would not be possible. Just point the config.json to the journal >> file. >> >> Best, >> Olaf >> >> >> On April 9, 2016 12:41:37 AM GMT+02:00, "Blaise de Carné" < >> bde...@gm...> wrote: >>> >>> Hi Olaf, >>> >>> Yes, we already took a look on your implementation. It looks good, but >>> we can't use it on a journal that is already used for the SPARQL Endpoint, >>> am i wrong ? >>> >>> Blaise >>> >>> Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw...> a >>> écrit : >>> >>>> Dear Blaise, >>>> >>>> As Michael mentioned, I implemented a TPF interface directly on top of >>>> Blazegraph. This implementation uses directly the Blazegraph internals >>>> and, >>>> thus, avoids the overhead of forwarding every TPF request to the SPARQL >>>> endpoint interface (as would be done by using the standard TPF server >>>> implementation). >>>> >>>> Find the original source code here: >>>> >>>> https://github.com/hartig/BlazegraphBasedTPFServer >>>> >>>> ...and note that this TPF interface is included in the official 2.0 >>>> release of >>>> Blazegraph: >>>> >>>> http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 >>>> <http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> >>>> >>>> Cheers, >>>> Olaf >>>> >>>> >>>> >>>> On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: >>>> > In response to the request from the bigdata-commit (see below), >>>> please let’s >>>> > resume the discussion on this place: >>>> > >>>> > Determinism is not guaranteed unless parallelism is explicitly >>>> disabled — >>>> > this even holds for select queries. There are several potential >>>> sources for >>>> > non-determinism: in the general case, Blazegraph may choose to run >>>> multiple >>>> > parallel threads for a given operator (processing different chunks of >>>> data >>>> > in parallel), and in some cases operators also use multiple threads >>>> > internally. >>>> > >>>> > For the given query at hand, the single triple pattern access path >>>> will >>>> > yield results in order, but this order actually might be destroyed by >>>> other >>>> > operators on top. The projection operator, for instance, does not >>>> guarantee >>>> > order in the general case, as it might process data in different >>>> threads. >>>> > The way to achieve determinism would be to explicitly disable this >>>> > parallelism. In fact, this is what Blazegraph is doing when >>>> projecting for >>>> > queries that have an ORDER BY clause. Code-wise, a good starting >>>> point is >>>> > in AST2BOpUtility, starting at line 579: >>>> > >>>> > <snip> >>>> > if (projection != null) { >>>> > >>>> > /** >>>> > * The projection after the ORDER BY needs to >>>> preserve the ordering. >>>> > * So does the chunked materialization >>>> operator. The code above >>>> > * handles this for ORDER_BY + DISTINCT, but >>>> does not go far enough >>>> > * to impose order preserving evaluation on the >>>> PROJECTION and >>>> > * chunked materialization, both of which are >>>> downstream from the >>>> > * ORDER_BY operator. >>>> > * >>>> > * @see #1044 (PROJECTION after ORDER BY does >>>> not preserve order) >>>> > */ >>>> > final boolean preserveOrder = orderBy != null; >>>> > >>>> > /* >>>> > * Append operator to drop variables which are not >>>> projected by >>>> > the * subquery. >>>> > * >>>> > * Note: We need to retain all variables which were >>>> visible in >>>> > the * parent group plus anything which was projected out of the * >>>> subquery. >>>> > Since there can be exogenous variables, the easiest way * to do this >>>> > correctly is to drop variables from the subquery plan * which are not >>>> > projected by the subquery. (This is not done at the * top-level query >>>> plan >>>> > because it would cause exogenous variables * to be dropped.) >>>> > */ >>>> > >>>> > { >>>> > // The variables projected by the >>>> subquery. >>>> > final IVariable<?>[] projectedVars = >>>> projection >>>> > .getProjectionVars(); >>>> > >>>> > final List<NV> anns = new >>>> LinkedList<NV>(); >>>> > anns.add(new NV(BOp.Annotations.BOP_ID, >>>> ctx.nextId())); >>>> > anns.add(new >>>> NV(BOp.Annotations.EVALUATION_CONTEXT, >>>> > BOpEvaluationContext.CONTROLLER)); anns.add(new >>>> > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats >>>> anns.add(new >>>> > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if >>>> (preserveOrder) { >>>> > /** >>>> > * @see #563 (ORDER BY + >>>> DISTINCT) >>>> > * @see #1044 (PROJECTION after >>>> ORDER BY does not preserve >>>> > * order) >>>> > */ >>>> > anns.add(new >>>> NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); >>>> > anns.add(new >>>> NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); >>>> > } >>>> > left = applyQueryHints(new >>>> ProjectionOp(leftOrEmpty(left),// >>>> > anns.toArray(new >>>> NV[anns.size()])// >>>> > ), queryBase, ctx); >>>> > } >>>> > </snip> >>>> > >>>> > If the preserve order flag is true, parallelism for the operator is >>>> > explicitly disabled. Disabling parallelism for the projection node >>>> would >>>> > help for simple queries such as single triple pattern, but in the >>>> general >>>> > case (for more complex queries) there will be other operators that >>>> might >>>> > cause non-deterministic behaviour. >>>> > >>>> > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on top >>>> of >>>> > Blazegraph, adding him in CC. >>>> > >>>> > >>>> > Best, >>>> > Michael >>>> > >>>> > > From: Blaise de Carné <bde...@gm...> >>>> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY >>>> > > Date: 8 April 2016 at 10:58:02 GMT+2 >>>> > > To: "big...@li..." >>>> > > <big...@li...> >>>> > > >>>> > > Hi there, >>>> > > >>>> > > I would like to expose a considiration that I find very annoying. I >>>> need >>>> > > to do more tests but i would like to know your fellings about it. >>>> > > >>>> > > Look for this exemple : >>>> > > >>>> > > construct where { >>>> > > >>>> > > ?s <http://geovocab.org/geometry#geometry >>>> > > <http://geovocab.org/geometry#geometry>> ?event> >>>> > > } limit 5 >>>> > > >>>> > > It take avout 100ms to execute on my 3B dataset. >>>> > > >>>> > > In 90% of time, this give me 5 results in the same order : >>>> > > >>>> > > <http://linkedgeodata.org/triplify/node1003406722> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1003406722>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1003 >>>> > > 406722> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1003406722>> >>>> > > <http://linkedgeodata.org/triplify/node1003749425> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1003749425>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1003 >>>> > > 749425> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1003749425>> >>>> > > <http://linkedgeodata.org/triplify/node1011261499> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1011261499>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1011 >>>> > > 261499> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1011261499>> >>>> > > <http://linkedgeodata.org/triplify/node1011261514> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1011261514>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1011 >>>> > > 261514> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1011261514>> >>>> > > <http://linkedgeodata.org/triplify/node1011286717> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1011286717>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1011 >>>> > > 286717> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get >>>> differents >>>> > > results : >>>> > > >>>> > > <http://linkedgeodata.org/triplify/node1204787784> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1204787784>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1204 >>>> > > 787784> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1204787784>> >>>> > > <http://linkedgeodata.org/triplify/node1206798938> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1206798938>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1206 >>>> > > 798938> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1206798938>> >>>> > > <http://linkedgeodata.org/triplify/node12081506> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node12081506>> >>>> <http://geovocab.org/geometry#geom >>>> > > etry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1208 >>>> > > 1506> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node12081506>> >>>> > > <http://linkedgeodata.org/triplify/node1209197022> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1209197022>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1209 >>>> > > 197022> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1209197022>> >>>> > > <http://linkedgeodata.org/triplify/node1212230478> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1212230478>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1212 >>>> > > 230478> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1212230478>> >>>> > > >>>> > > Conclusion : order is not garantee without ORDER BY. If i use an >>>> ORDER BY, >>>> > > performance drop alarmingly. >>>> > > >>>> > > Now take this fabulous project : Linked Data Fragments >>>> > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/ >>>> >), >>>> > > which provide a SparqlDatasource to handle data from a SPARQL >>>> Endpoint. >>>> > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the >>>> results, >>>> > > as they says in the comments : >>>> > > >>>> > > // Even though the SPARQL spec indicates that >>>> > > // LIMIT and OFFSET might be meaningless without ORDER BY, >>>> > > // this doesn't seem a problem in practice. >>>> > > // Furthermore, sorting can be slow. Therefore, don't sort. >>>> > > >>>> > > But it's a problem in practice with Blazegraph, and i exeperimented >>>> it : a >>>> > > Linked Data Fragments server configured over a Blazegraph SPARQL >>>> Endpoint >>>> > > serve different pages in 5-10% of time. >>>> > > >>>> > > In our project we really need to get consistent pagination, without >>>> ORDER >>>> > > BY. Do you think that is possible with Blazegraph ? >>>> > > >>>> > > Bests, >>>> > > Blaise >>>> > > >>>> > > PS : i don't see this behaviour with SELECT, but cache could be >>>> > > responsible... >>>> >>> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity. >> >> >> ------------------------------------------------------------------------------ >> Find and fix application performance issues faster with Applications >> Manager >> Applications Manager provides deep performance insights into multiple >> tiers of >> your business applications. It resolves application problems quickly and >> reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/ >> gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532 >> <http://pubads.g.doubleclick.net/gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532> >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >> > |
From: Bryan T. <br...@sy...> - 2016-04-10 15:07:30
|
Blaise, Please confirm that you can simply reconfigure to access an existing Journal file. This should work. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder Blazegraph e: br...@bl... w: http://blazegraph.com Blazegraph products help to solve the Graph Cache Thrash to achieve large scale processing for graph and predictive analytics. Blazegraph is the creator of the industry’s first GPU-accelerated high-performance database for large graphs, has been named as one of the “10 Companies and Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. Blazegraph Database <https://www.blazegraph.com/> is our ultra-high performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Blazegraph GPU <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new technologies that use GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Sat, Apr 9, 2016 at 1:02 AM, Olaf Hartig <oh...@uw...> wrote: > Hi Braise, > > I think you can do it. Although I have not tested this use case, I do not > see why it would not be possible. Just point the config.json to the journal > file. > > Best, > Olaf > > > On April 9, 2016 12:41:37 AM GMT+02:00, "Blaise de Carné" < > bde...@gm...> wrote: >> >> Hi Olaf, >> >> Yes, we already took a look on your implementation. It looks good, but we >> can't use it on a journal that is already used for the SPARQL Endpoint, am >> i wrong ? >> >> Blaise >> >> Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw...> a écrit : >> >>> Dear Blaise, >>> >>> As Michael mentioned, I implemented a TPF interface directly on top of >>> Blazegraph. This implementation uses directly the Blazegraph internals >>> and, >>> thus, avoids the overhead of forwarding every TPF request to the SPARQL >>> endpoint interface (as would be done by using the standard TPF server >>> implementation). >>> >>> Find the original source code here: >>> >>> https://github.com/hartig/BlazegraphBasedTPFServer >>> >>> ...and note that this TPF interface is included in the official 2.0 >>> release of >>> Blazegraph: >>> >>> http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 >>> <http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> >>> >>> Cheers, >>> Olaf >>> >>> >>> >>> On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: >>> > In response to the request from the bigdata-commit (see below), please >>> let’s >>> > resume the discussion on this place: >>> > >>> > Determinism is not guaranteed unless parallelism is explicitly >>> disabled — >>> > this even holds for select queries. There are several potential >>> sources for >>> > non-determinism: in the general case, Blazegraph may choose to run >>> multiple >>> > parallel threads for a given operator (processing different chunks of >>> data >>> > in parallel), and in some cases operators also use multiple threads >>> > internally. >>> > >>> > For the given query at hand, the single triple pattern access path will >>> > yield results in order, but this order actually might be destroyed by >>> other >>> > operators on top. The projection operator, for instance, does not >>> guarantee >>> > order in the general case, as it might process data in different >>> threads. >>> > The way to achieve determinism would be to explicitly disable this >>> > parallelism. In fact, this is what Blazegraph is doing when projecting >>> for >>> > queries that have an ORDER BY clause. Code-wise, a good starting point >>> is >>> > in AST2BOpUtility, starting at line 579: >>> > >>> > <snip> >>> > if (projection != null) { >>> > >>> > /** >>> > * The projection after the ORDER BY needs to >>> preserve the ordering. >>> > * So does the chunked materialization operator. >>> The code above >>> > * handles this for ORDER_BY + DISTINCT, but >>> does not go far enough >>> > * to impose order preserving evaluation on the >>> PROJECTION and >>> > * chunked materialization, both of which are >>> downstream from the >>> > * ORDER_BY operator. >>> > * >>> > * @see #1044 (PROJECTION after ORDER BY does >>> not preserve order) >>> > */ >>> > final boolean preserveOrder = orderBy != null; >>> > >>> > /* >>> > * Append operator to drop variables which are not >>> projected by >>> > the * subquery. >>> > * >>> > * Note: We need to retain all variables which were >>> visible in >>> > the * parent group plus anything which was projected out of the * >>> subquery. >>> > Since there can be exogenous variables, the easiest way * to do this >>> > correctly is to drop variables from the subquery plan * which are not >>> > projected by the subquery. (This is not done at the * top-level query >>> plan >>> > because it would cause exogenous variables * to be dropped.) >>> > */ >>> > >>> > { >>> > // The variables projected by the >>> subquery. >>> > final IVariable<?>[] projectedVars = >>> projection >>> > .getProjectionVars(); >>> > >>> > final List<NV> anns = new >>> LinkedList<NV>(); >>> > anns.add(new NV(BOp.Annotations.BOP_ID, >>> ctx.nextId())); >>> > anns.add(new >>> NV(BOp.Annotations.EVALUATION_CONTEXT, >>> > BOpEvaluationContext.CONTROLLER)); anns.add(new >>> > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats >>> anns.add(new >>> > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if >>> (preserveOrder) { >>> > /** >>> > * @see #563 (ORDER BY + >>> DISTINCT) >>> > * @see #1044 (PROJECTION after >>> ORDER BY does not preserve >>> > * order) >>> > */ >>> > anns.add(new >>> NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); >>> > anns.add(new >>> NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); >>> > } >>> > left = applyQueryHints(new >>> ProjectionOp(leftOrEmpty(left),// >>> > anns.toArray(new >>> NV[anns.size()])// >>> > ), queryBase, ctx); >>> > } >>> > </snip> >>> > >>> > If the preserve order flag is true, parallelism for the operator is >>> > explicitly disabled. Disabling parallelism for the projection node >>> would >>> > help for simple queries such as single triple pattern, but in the >>> general >>> > case (for more complex queries) there will be other operators that >>> might >>> > cause non-deterministic behaviour. >>> > >>> > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on top >>> of >>> > Blazegraph, adding him in CC. >>> > >>> > >>> > Best, >>> > Michael >>> > >>> > > From: Blaise de Carné <bde...@gm...> >>> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY >>> > > Date: 8 April 2016 at 10:58:02 GMT+2 >>> > > To: "big...@li..." >>> > > <big...@li...> >>> > > >>> > > Hi there, >>> > > >>> > > I would like to expose a considiration that I find very annoying. I >>> need >>> > > to do more tests but i would like to know your fellings about it. >>> > > >>> > > Look for this exemple : >>> > > >>> > > construct where { >>> > > >>> > > ?s <http://geovocab.org/geometry#geometry >>> > > <http://geovocab.org/geometry#geometry>> ?event> >>> > > } limit 5 >>> > > >>> > > It take avout 100ms to execute on my 3B dataset. >>> > > >>> > > In 90% of time, this give me 5 results in the same order : >>> > > >>> > > <http://linkedgeodata.org/triplify/node1003406722> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1003406722>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1003 >>> > > 406722> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1003406722>> >>> > > <http://linkedgeodata.org/triplify/node1003749425> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1003749425>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1003 >>> > > 749425> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1003749425>> >>> > > <http://linkedgeodata.org/triplify/node1011261499> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1011261499>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1011 >>> > > 261499> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1011261499>> >>> > > <http://linkedgeodata.org/triplify/node1011261514> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1011261514>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1011 >>> > > 261514> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1011261514>> >>> > > <http://linkedgeodata.org/triplify/node1011286717> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1011286717>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1011 >>> > > 286717> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get >>> differents >>> > > results : >>> > > >>> > > <http://linkedgeodata.org/triplify/node1204787784> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1204787784>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1204 >>> > > 787784> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1204787784>> >>> > > <http://linkedgeodata.org/triplify/node1206798938> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1206798938>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1206 >>> > > 798938> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1206798938>> >>> > > <http://linkedgeodata.org/triplify/node12081506> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node12081506>> >>> <http://geovocab.org/geometry#geom >>> > > etry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1208 >>> > > 1506> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node12081506>> >>> > > <http://linkedgeodata.org/triplify/node1209197022> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1209197022>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1209 >>> > > 197022> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1209197022>> >>> > > <http://linkedgeodata.org/triplify/node1212230478> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1212230478>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1212 >>> > > 230478> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1212230478>> >>> > > >>> > > Conclusion : order is not garantee without ORDER BY. If i use an >>> ORDER BY, >>> > > performance drop alarmingly. >>> > > >>> > > Now take this fabulous project : Linked Data Fragments >>> > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/>), >>> > > which provide a SparqlDatasource to handle data from a SPARQL >>> Endpoint. >>> > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the >>> results, >>> > > as they says in the comments : >>> > > >>> > > // Even though the SPARQL spec indicates that >>> > > // LIMIT and OFFSET might be meaningless without ORDER BY, >>> > > // this doesn't seem a problem in practice. >>> > > // Furthermore, sorting can be slow. Therefore, don't sort. >>> > > >>> > > But it's a problem in practice with Blazegraph, and i exeperimented >>> it : a >>> > > Linked Data Fragments server configured over a Blazegraph SPARQL >>> Endpoint >>> > > serve different pages in 5-10% of time. >>> > > >>> > > In our project we really need to get consistent pagination, without >>> ORDER >>> > > BY. Do you think that is possible with Blazegraph ? >>> > > >>> > > Bests, >>> > > Blaise >>> > > >>> > > PS : i don't see this behaviour with SELECT, but cache could be >>> > > responsible... >>> >> > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications > Manager > Applications Manager provides deep performance insights into multiple > tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/ > gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532 > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
From: Olaf H. <oh...@uw...> - 2016-04-09 05:02:26
|
Hi Braise, I think you can do it. Although I have not tested this use case, I do not see why it would not be possible. Just point the config.json to the journal file. Best, Olaf On April 9, 2016 12:41:37 AM GMT+02:00, "Blaise de Carné" <bde...@gm...> wrote: >Hi Olaf, > >Yes, we already took a look on your implementation. It looks good, but >we >can't use it on a journal that is already used for the SPARQL Endpoint, >am >i wrong ? > >Blaise > >Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw...> a écrit >: > >> Dear Blaise, >> >> As Michael mentioned, I implemented a TPF interface directly on top >of >> Blazegraph. This implementation uses directly the Blazegraph >internals and, >> thus, avoids the overhead of forwarding every TPF request to the >SPARQL >> endpoint interface (as would be done by using the standard TPF server >> implementation). >> >> Find the original source code here: >> >> https://github.com/hartig/BlazegraphBasedTPFServer >> >> ...and note that this TPF interface is included in the official 2.0 >> release of >> Blazegraph: >> >> >http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 >> ><http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> >> >> Cheers, >> Olaf >> >> >> >> On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: >> > In response to the request from the bigdata-commit (see below), >please >> let’s >> > resume the discussion on this place: >> > >> > Determinism is not guaranteed unless parallelism is explicitly >disabled — >> > this even holds for select queries. There are several potential >sources >> for >> > non-determinism: in the general case, Blazegraph may choose to run >> multiple >> > parallel threads for a given operator (processing different chunks >of >> data >> > in parallel), and in some cases operators also use multiple threads >> > internally. >> > >> > For the given query at hand, the single triple pattern access path >will >> > yield results in order, but this order actually might be destroyed >by >> other >> > operators on top. The projection operator, for instance, does not >> guarantee >> > order in the general case, as it might process data in different >threads. >> > The way to achieve determinism would be to explicitly disable this >> > parallelism. In fact, this is what Blazegraph is doing when >projecting >> for >> > queries that have an ORDER BY clause. Code-wise, a good starting >point is >> > in AST2BOpUtility, starting at line 579: >> > >> > <snip> >> > if (projection != null) { >> > >> > /** >> > * The projection after the ORDER BY needs to >> preserve the ordering. >> > * So does the chunked materialization >operator. >> The code above >> > * handles this for ORDER_BY + DISTINCT, but >does >> not go far enough >> > * to impose order preserving evaluation on >the >> PROJECTION and >> > * chunked materialization, both of which are >> downstream from the >> > * ORDER_BY operator. >> > * >> > * @see #1044 (PROJECTION after ORDER BY does >not >> preserve order) >> > */ >> > final boolean preserveOrder = orderBy != null; >> > >> > /* >> > * Append operator to drop variables which are not >projected >> by >> > the * subquery. >> > * >> > * Note: We need to retain all variables which were >visible >> in >> > the * parent group plus anything which was projected out of the * >> subquery. >> > Since there can be exogenous variables, the easiest way * to do >this >> > correctly is to drop variables from the subquery plan * which are >not >> > projected by the subquery. (This is not done at the * top-level >query >> plan >> > because it would cause exogenous variables * to be dropped.) >> > */ >> > >> > { >> > // The variables projected by the >subquery. >> > final IVariable<?>[] projectedVars = >> projection >> > .getProjectionVars(); >> > >> > final List<NV> anns = new >LinkedList<NV>(); >> > anns.add(new >NV(BOp.Annotations.BOP_ID, >> ctx.nextId())); >> > anns.add(new >> NV(BOp.Annotations.EVALUATION_CONTEXT, >> > BOpEvaluationContext.CONTROLLER)); anns.add(new >> > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats >anns.add(new >> > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if >(preserveOrder) { >> > /** >> > * @see #563 (ORDER BY + >DISTINCT) >> > * @see #1044 (PROJECTION >after >> ORDER BY does not preserve >> > * order) >> > */ >> > anns.add(new >> NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); >> > anns.add(new >> NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); >> > } >> > left = applyQueryHints(new >> ProjectionOp(leftOrEmpty(left),// >> > anns.toArray(new >> NV[anns.size()])// >> > ), queryBase, ctx); >> > } >> > </snip> >> > >> > If the preserve order flag is true, parallelism for the operator is >> > explicitly disabled. Disabling parallelism for the projection node >would >> > help for simple queries such as single triple pattern, but in the >general >> > case (for more complex queries) there will be other operators that >might >> > cause non-deterministic behaviour. >> > >> > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on >top of >> > Blazegraph, adding him in CC. >> > >> > >> > Best, >> > Michael >> > >> > > From: Blaise de Carné <bde...@gm...> >> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY >> > > Date: 8 April 2016 at 10:58:02 GMT+2 >> > > To: "big...@li..." >> > > <big...@li...> >> > > >> > > Hi there, >> > > >> > > I would like to expose a considiration that I find very annoying. >I >> need >> > > to do more tests but i would like to know your fellings about it. >> > > >> > > Look for this exemple : >> > > >> > > construct where { >> > > >> > > ?s <http://geovocab.org/geometry#geometry >> > > <http://geovocab.org/geometry#geometry>> ?event> >> > > } limit 5 >> > > >> > > It take avout 100ms to execute on my 3B dataset. >> > > >> > > In 90% of time, this give me 5 results in the same order : >> > > >> > > <http://linkedgeodata.org/triplify/node1003406722> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1003406722>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1003 >> > > 406722> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1003406722>> >> > > <http://linkedgeodata.org/triplify/node1003749425> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1003749425>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1003 >> > > 749425> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1003749425>> >> > > <http://linkedgeodata.org/triplify/node1011261499> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1011261499>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1011 >> > > 261499> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1011261499>> >> > > <http://linkedgeodata.org/triplify/node1011261514> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1011261514>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1011 >> > > 261514> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1011261514>> >> > > <http://linkedgeodata.org/triplify/node1011286717> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1011286717>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1011 >> > > 286717> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get >> differents >> > > results : >> > > >> > > <http://linkedgeodata.org/triplify/node1204787784> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1204787784>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1204 >> > > 787784> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1204787784>> >> > > <http://linkedgeodata.org/triplify/node1206798938> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1206798938>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1206 >> > > 798938> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1206798938>> >> > > <http://linkedgeodata.org/triplify/node12081506> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node12081506>> >> <http://geovocab.org/geometry#geom >> > > etry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1208 >> > > 1506> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node12081506>> >> > > <http://linkedgeodata.org/triplify/node1209197022> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1209197022>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1209 >> > > 197022> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1209197022>> >> > > <http://linkedgeodata.org/triplify/node1212230478> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1212230478>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1212 >> > > 230478> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1212230478>> >> > > >> > > Conclusion : order is not garantee without ORDER BY. If i use an >ORDER >> BY, >> > > performance drop alarmingly. >> > > >> > > Now take this fabulous project : Linked Data Fragments >> > > (http://linkeddatafragments.org/ ><http://linkeddatafragments.org/>), >> > > which provide a SparqlDatasource to handle data from a SPARQL >Endpoint. >> > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the >> results, >> > > as they says in the comments : >> > > >> > > // Even though the SPARQL spec indicates that >> > > // LIMIT and OFFSET might be meaningless without ORDER BY, >> > > // this doesn't seem a problem in practice. >> > > // Furthermore, sorting can be slow. Therefore, don't sort. >> > > >> > > But it's a problem in practice with Blazegraph, and i >exeperimented it >> : a >> > > Linked Data Fragments server configured over a Blazegraph SPARQL >> Endpoint >> > > serve different pages in 5-10% of time. >> > > >> > > In our project we really need to get consistent pagination, >without >> ORDER >> > > BY. Do you think that is possible with Blazegraph ? >> > > >> > > Bests, >> > > Blaise >> > > >> > > PS : i don't see this behaviour with SELECT, but cache could be >> > > responsible... >> -- Sent from my Android device with K-9 Mail. Please excuse my brevity. |
From: Blaise de C. <bde...@gm...> - 2016-04-08 22:42:25
|
Hi Olaf, Yes, we already took a look on your implementation. It looks good, but we can't use it on a journal that is already used for the SPARQL Endpoint, am i wrong ? Blaise Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw...> a écrit : > Dear Blaise, > > As Michael mentioned, I implemented a TPF interface directly on top of > Blazegraph. This implementation uses directly the Blazegraph internals and, > thus, avoids the overhead of forwarding every TPF request to the SPARQL > endpoint interface (as would be done by using the standard TPF server > implementation). > > Find the original source code here: > > https://github.com/hartig/BlazegraphBasedTPFServer > > ...and note that this TPF interface is included in the official 2.0 > release of > Blazegraph: > > http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 > <http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> > > Cheers, > Olaf > > > > On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: > > In response to the request from the bigdata-commit (see below), please > let’s > > resume the discussion on this place: > > > > Determinism is not guaranteed unless parallelism is explicitly disabled — > > this even holds for select queries. There are several potential sources > for > > non-determinism: in the general case, Blazegraph may choose to run > multiple > > parallel threads for a given operator (processing different chunks of > data > > in parallel), and in some cases operators also use multiple threads > > internally. > > > > For the given query at hand, the single triple pattern access path will > > yield results in order, but this order actually might be destroyed by > other > > operators on top. The projection operator, for instance, does not > guarantee > > order in the general case, as it might process data in different threads. > > The way to achieve determinism would be to explicitly disable this > > parallelism. In fact, this is what Blazegraph is doing when projecting > for > > queries that have an ORDER BY clause. Code-wise, a good starting point is > > in AST2BOpUtility, starting at line 579: > > > > <snip> > > if (projection != null) { > > > > /** > > * The projection after the ORDER BY needs to > preserve the ordering. > > * So does the chunked materialization operator. > The code above > > * handles this for ORDER_BY + DISTINCT, but does > not go far enough > > * to impose order preserving evaluation on the > PROJECTION and > > * chunked materialization, both of which are > downstream from the > > * ORDER_BY operator. > > * > > * @see #1044 (PROJECTION after ORDER BY does not > preserve order) > > */ > > final boolean preserveOrder = orderBy != null; > > > > /* > > * Append operator to drop variables which are not projected > by > > the * subquery. > > * > > * Note: We need to retain all variables which were visible > in > > the * parent group plus anything which was projected out of the * > subquery. > > Since there can be exogenous variables, the easiest way * to do this > > correctly is to drop variables from the subquery plan * which are not > > projected by the subquery. (This is not done at the * top-level query > plan > > because it would cause exogenous variables * to be dropped.) > > */ > > > > { > > // The variables projected by the subquery. > > final IVariable<?>[] projectedVars = > projection > > .getProjectionVars(); > > > > final List<NV> anns = new LinkedList<NV>(); > > anns.add(new NV(BOp.Annotations.BOP_ID, > ctx.nextId())); > > anns.add(new > NV(BOp.Annotations.EVALUATION_CONTEXT, > > BOpEvaluationContext.CONTROLLER)); anns.add(new > > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats anns.add(new > > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if (preserveOrder) { > > /** > > * @see #563 (ORDER BY + DISTINCT) > > * @see #1044 (PROJECTION after > ORDER BY does not preserve > > * order) > > */ > > anns.add(new > NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); > > anns.add(new > NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); > > } > > left = applyQueryHints(new > ProjectionOp(leftOrEmpty(left),// > > anns.toArray(new > NV[anns.size()])// > > ), queryBase, ctx); > > } > > </snip> > > > > If the preserve order flag is true, parallelism for the operator is > > explicitly disabled. Disabling parallelism for the projection node would > > help for simple queries such as single triple pattern, but in the general > > case (for more complex queries) there will be other operators that might > > cause non-deterministic behaviour. > > > > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on top of > > Blazegraph, adding him in CC. > > > > > > Best, > > Michael > > > > > From: Blaise de Carné <bde...@gm...> > > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY > > > Date: 8 April 2016 at 10:58:02 GMT+2 > > > To: "big...@li..." > > > <big...@li...> > > > > > > Hi there, > > > > > > I would like to expose a considiration that I find very annoying. I > need > > > to do more tests but i would like to know your fellings about it. > > > > > > Look for this exemple : > > > > > > construct where { > > > > > > ?s <http://geovocab.org/geometry#geometry > > > <http://geovocab.org/geometry#geometry>> ?event> > > > } limit 5 > > > > > > It take avout 100ms to execute on my 3B dataset. > > > > > > In 90% of time, this give me 5 results in the same order : > > > > > > <http://linkedgeodata.org/triplify/node1003406722> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1003406722>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1003 > > > 406722> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1003406722>> > > > <http://linkedgeodata.org/triplify/node1003749425> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1003749425>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1003 > > > 749425> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1003749425>> > > > <http://linkedgeodata.org/triplify/node1011261499> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1011261499>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1011 > > > 261499> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1011261499>> > > > <http://linkedgeodata.org/triplify/node1011261514> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1011261514>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1011 > > > 261514> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1011261514>> > > > <http://linkedgeodata.org/triplify/node1011286717> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1011286717>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1011 > > > 286717> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get > differents > > > results : > > > > > > <http://linkedgeodata.org/triplify/node1204787784> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1204787784>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1204 > > > 787784> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1204787784>> > > > <http://linkedgeodata.org/triplify/node1206798938> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1206798938>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1206 > > > 798938> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1206798938>> > > > <http://linkedgeodata.org/triplify/node12081506> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node12081506>> > <http://geovocab.org/geometry#geom > > > etry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1208 > > > 1506> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node12081506>> > > > <http://linkedgeodata.org/triplify/node1209197022> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1209197022>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1209 > > > 197022> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1209197022>> > > > <http://linkedgeodata.org/triplify/node1212230478> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1212230478>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1212 > > > 230478> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1212230478>> > > > > > > Conclusion : order is not garantee without ORDER BY. If i use an ORDER > BY, > > > performance drop alarmingly. > > > > > > Now take this fabulous project : Linked Data Fragments > > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/>), > > > which provide a SparqlDatasource to handle data from a SPARQL Endpoint. > > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the > results, > > > as they says in the comments : > > > > > > // Even though the SPARQL spec indicates that > > > // LIMIT and OFFSET might be meaningless without ORDER BY, > > > // this doesn't seem a problem in practice. > > > // Furthermore, sorting can be slow. Therefore, don't sort. > > > > > > But it's a problem in practice with Blazegraph, and i exeperimented it > : a > > > Linked Data Fragments server configured over a Blazegraph SPARQL > Endpoint > > > serve different pages in 5-10% of time. > > > > > > In our project we really need to get consistent pagination, without > ORDER > > > BY. Do you think that is possible with Blazegraph ? > > > > > > Bests, > > > Blaise > > > > > > PS : i don't see this behaviour with SELECT, but cache could be > > > responsible... > |
From: Olaf H. <oh...@uw...> - 2016-04-08 15:25:58
|
Dear Blaise, As Michael mentioned, I implemented a TPF interface directly on top of Blazegraph. This implementation uses directly the Blazegraph internals and, thus, avoids the overhead of forwarding every TPF request to the SPARQL endpoint interface (as would be done by using the standard TPF server implementation). Find the original source code here: https://github.com/hartig/BlazegraphBasedTPFServer ...and note that this TPF interface is included in the official 2.0 release of Blazegraph: http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 Cheers, Olaf On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: > In response to the request from the bigdata-commit (see below), please let’s > resume the discussion on this place: > > Determinism is not guaranteed unless parallelism is explicitly disabled — > this even holds for select queries. There are several potential sources for > non-determinism: in the general case, Blazegraph may choose to run multiple > parallel threads for a given operator (processing different chunks of data > in parallel), and in some cases operators also use multiple threads > internally. > > For the given query at hand, the single triple pattern access path will > yield results in order, but this order actually might be destroyed by other > operators on top. The projection operator, for instance, does not guarantee > order in the general case, as it might process data in different threads. > The way to achieve determinism would be to explicitly disable this > parallelism. In fact, this is what Blazegraph is doing when projecting for > queries that have an ORDER BY clause. Code-wise, a good starting point is > in AST2BOpUtility, starting at line 579: > > <snip> > if (projection != null) { > > /** > * The projection after the ORDER BY needs to preserve the ordering. > * So does the chunked materialization operator. The code above > * handles this for ORDER_BY + DISTINCT, but does not go far enough > * to impose order preserving evaluation on the PROJECTION and > * chunked materialization, both of which are downstream from the > * ORDER_BY operator. > * > * @see #1044 (PROJECTION after ORDER BY does not preserve order) > */ > final boolean preserveOrder = orderBy != null; > > /* > * Append operator to drop variables which are not projected by > the * subquery. > * > * Note: We need to retain all variables which were visible in > the * parent group plus anything which was projected out of the * subquery. > Since there can be exogenous variables, the easiest way * to do this > correctly is to drop variables from the subquery plan * which are not > projected by the subquery. (This is not done at the * top-level query plan > because it would cause exogenous variables * to be dropped.) > */ > > { > // The variables projected by the subquery. > final IVariable<?>[] projectedVars = projection > .getProjectionVars(); > > final List<NV> anns = new LinkedList<NV>(); > anns.add(new NV(BOp.Annotations.BOP_ID, ctx.nextId())); > anns.add(new NV(BOp.Annotations.EVALUATION_CONTEXT, > BOpEvaluationContext.CONTROLLER)); anns.add(new > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats anns.add(new > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if (preserveOrder) { > /** > * @see #563 (ORDER BY + DISTINCT) > * @see #1044 (PROJECTION after ORDER BY does not preserve > * order) > */ > anns.add(new NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); > anns.add(new NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); > } > left = applyQueryHints(new ProjectionOp(leftOrEmpty(left),// > anns.toArray(new NV[anns.size()])// > ), queryBase, ctx); > } > </snip> > > If the preserve order flag is true, parallelism for the operator is > explicitly disabled. Disabling parallelism for the projection node would > help for simple queries such as single triple pattern, but in the general > case (for more complex queries) there will be other operators that might > cause non-deterministic behaviour. > > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on top of > Blazegraph, adding him in CC. > > > Best, > Michael > > > From: Blaise de Carné <bde...@gm...> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY > > Date: 8 April 2016 at 10:58:02 GMT+2 > > To: "big...@li..." > > <big...@li...> > > > > Hi there, > > > > I would like to expose a considiration that I find very annoying. I need > > to do more tests but i would like to know your fellings about it. > > > > Look for this exemple : > > > > construct where { > > > > ?s <http://geovocab.org/geometry#geometry > > <http://geovocab.org/geometry#geometry>> ?event> > > } limit 5 > > > > It take avout 100ms to execute on my 3B dataset. > > > > In 90% of time, this give me 5 results in the same order : > > > > <http://linkedgeodata.org/triplify/node1003406722> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1003406722>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1003 > > 406722> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1003406722>> > > <http://linkedgeodata.org/triplify/node1003749425> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1003749425>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1003 > > 749425> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1003749425>> > > <http://linkedgeodata.org/triplify/node1011261499> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1011261499>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011 > > 261499> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1011261499>> > > <http://linkedgeodata.org/triplify/node1011261514> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1011261514>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011 > > 261514> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1011261514>> > > <http://linkedgeodata.org/triplify/node1011286717> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1011286717>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011 > > 286717> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get differents > > results : > > > > <http://linkedgeodata.org/triplify/node1204787784> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1204787784>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1204 > > 787784> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1204787784>> > > <http://linkedgeodata.org/triplify/node1206798938> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1206798938>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1206 > > 798938> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1206798938>> > > <http://linkedgeodata.org/triplify/node12081506> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node12081506>> <http://geovocab.org/geometry#geom > > etry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1208 > > 1506> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node12081506>> > > <http://linkedgeodata.org/triplify/node1209197022> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1209197022>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1209 > > 197022> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1209197022>> > > <http://linkedgeodata.org/triplify/node1212230478> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1212230478>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1212 > > 230478> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1212230478>> > > > > Conclusion : order is not garantee without ORDER BY. If i use an ORDER BY, > > performance drop alarmingly. > > > > Now take this fabulous project : Linked Data Fragments > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/>), > > which provide a SparqlDatasource to handle data from a SPARQL Endpoint. > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the results, > > as they says in the comments : > > > > // Even though the SPARQL spec indicates that > > // LIMIT and OFFSET might be meaningless without ORDER BY, > > // this doesn't seem a problem in practice. > > // Furthermore, sorting can be slow. Therefore, don't sort. > > > > But it's a problem in practice with Blazegraph, and i exeperimented it : a > > Linked Data Fragments server configured over a Blazegraph SPARQL Endpoint > > serve different pages in 5-10% of time. > > > > In our project we really need to get consistent pagination, without ORDER > > BY. Do you think that is possible with Blazegraph ? > > > > Bests, > > Blaise > > > > PS : i don't see this behaviour with SELECT, but cache could be > > responsible... |
From: Michael S. <ms...@me...> - 2016-04-08 13:36:58
|
In response to the request from the bigdata-commit (see below), please let’s resume the discussion on this place: Determinism is not guaranteed unless parallelism is explicitly disabled — this even holds for select queries. There are several potential sources for non-determinism: in the general case, Blazegraph may choose to run multiple parallel threads for a given operator (processing different chunks of data in parallel), and in some cases operators also use multiple threads internally. For the given query at hand, the single triple pattern access path will yield results in order, but this order actually might be destroyed by other operators on top. The projection operator, for instance, does not guarantee order in the general case, as it might process data in different threads. The way to achieve determinism would be to explicitly disable this parallelism. In fact, this is what Blazegraph is doing when projecting for queries that have an ORDER BY clause. Code-wise, a good starting point is in AST2BOpUtility, starting at line 579: <snip> if (projection != null) { /** * The projection after the ORDER BY needs to preserve the ordering. * So does the chunked materialization operator. The code above * handles this for ORDER_BY + DISTINCT, but does not go far enough * to impose order preserving evaluation on the PROJECTION and * chunked materialization, both of which are downstream from the * ORDER_BY operator. * * @see #1044 (PROJECTION after ORDER BY does not preserve order) */ final boolean preserveOrder = orderBy != null; /* * Append operator to drop variables which are not projected by the * subquery. * * Note: We need to retain all variables which were visible in the * parent group plus anything which was projected out of the * subquery. Since there can be exogenous variables, the easiest way * to do this correctly is to drop variables from the subquery plan * which are not projected by the subquery. (This is not done at the * top-level query plan because it would cause exogenous variables * to be dropped.) */ { // The variables projected by the subquery. final IVariable<?>[] projectedVars = projection .getProjectionVars(); final List<NV> anns = new LinkedList<NV>(); anns.add(new NV(BOp.Annotations.BOP_ID, ctx.nextId())); anns.add(new NV(BOp.Annotations.EVALUATION_CONTEXT, BOpEvaluationContext.CONTROLLER)); anns.add(new NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats anns.add(new NV(ProjectionOp.Annotations.SELECT, projectedVars)); if (preserveOrder) { /** * @see #563 (ORDER BY + DISTINCT) * @see #1044 (PROJECTION after ORDER BY does not preserve * order) */ anns.add(new NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); anns.add(new NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); } left = applyQueryHints(new ProjectionOp(leftOrEmpty(left),// anns.toArray(new NV[anns.size()])// ), queryBase, ctx); } </snip> If the preserve order flag is true, parallelism for the operator is explicitly disabled. Disabling parallelism for the projection node would help for simple queries such as single triple pattern, but in the general case (for more complex queries) there will be other operators that might cause non-deterministic behaviour. @Olaf Hartig (CC) implemented a Linked Data Fragment interface on top of Blazegraph, adding him in CC. Best, Michael > From: Blaise de Carné <bde...@gm...> > Subject: [Bigdata-commit] Pagination consistency without ORDER BY > Date: 8 April 2016 at 10:58:02 GMT+2 > To: "big...@li..." <big...@li...> > > Hi there, > > I would like to expose a considiration that I find very annoying. I need to do more tests but i would like to know your fellings about it. > > Look for this exemple : > > construct where { > ?s <http://geovocab.org/geometry#geometry <http://geovocab.org/geometry#geometry>> ?event > } limit 5 > > It take avout 100ms to execute on my 3B dataset. > > In 90% of time, this give me 5 results in the same order : > > <http://linkedgeodata.org/triplify/node1003406722> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1003406722>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1003406722> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1003406722>> > <http://linkedgeodata.org/triplify/node1003749425> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1003749425>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1003749425> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1003749425>> > <http://linkedgeodata.org/triplify/node1011261499> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1011261499>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011261499> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1011261499>> > <http://linkedgeodata.org/triplify/node1011261514> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1011261514>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011261514> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1011261514>> > <http://linkedgeodata.org/triplify/node1011286717> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1011286717>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011286717> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1011286717>> > But sometime, i get differents results : > > <http://linkedgeodata.org/triplify/node1204787784> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1204787784>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1204787784> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1204787784>> > <http://linkedgeodata.org/triplify/node1206798938> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1206798938>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1206798938> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1206798938>> > <http://linkedgeodata.org/triplify/node12081506> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node12081506>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node12081506> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node12081506>> > <http://linkedgeodata.org/triplify/node1209197022> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1209197022>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1209197022> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1209197022>> > <http://linkedgeodata.org/triplify/node1212230478> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1212230478>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1212230478> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1212230478>> > > Conclusion : order is not garantee without ORDER BY. If i use an ORDER BY, performance drop alarmingly. > > Now take this fabulous project : Linked Data Fragments (http://linkeddatafragments.org/ <http://linkeddatafragments.org/>), which provide a SparqlDatasource to handle data from a SPARQL Endpoint. They use CONSTRUCT queries with LIMIT and OFFSET to paginate the results, as they says in the comments : > > // Even though the SPARQL spec indicates that > // LIMIT and OFFSET might be meaningless without ORDER BY, > // this doesn't seem a problem in practice. > // Furthermore, sorting can be slow. Therefore, don't sort. > > But it's a problem in practice with Blazegraph, and i exeperimented it : a Linked Data Fragments server configured over a Blazegraph SPARQL Endpoint serve different pages in 5-10% of time. > > In our project we really need to get consistent pagination, without ORDER BY. Do you think that is possible with Blazegraph ? > > Bests, > Blaise > > PS : i don't see this behaviour with SELECT, but cache could be responsible... > -- > Blaise de Carné > bde...@gm... <mailto:bde...@gm...> > 06.73.67.28.38 > ------------------------------------------------------------------------------ > _______________________________________________ > Bigdata-commit mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-commit |
From: Bryan T. <br...@sy...> - 2016-03-31 17:10:01
|
Yes, the rdf4j parsers are not graceful about invalid URIs. Bryan ---- Bryan Thompson Chief Scientist & Founder Blazegraph e: br...@bl... w: http://blazegraph.com Blazegraph products help to solve the Graph Cache Thrash to achieve large scale processing for graph and predictive analytics. Blazegraph is the creator of the industry’s first GPU-accelerated high-performance database for large graphs, has been named as one of the “10 Companies and Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. Blazegraph Database <https://www.blazegraph.com/> is our ultra-high performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Blazegraph GPU <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new technologies that use GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Thu, Mar 31, 2016 at 1:07 PM, Andreas Kahl <ka...@bs...> wrote: > Bryan, > > I think the bug is fixed: at least I loaded successfully 98,500,000 > triples - until I hit an unencoded space in an URI. > Setting > com.bigdata.rdf.rio.RDFParserOptions.stopAtFirstError=false > com.bigdata.rdf.rio.RDFParserOptions.verifyData=false > did not help. I will try to contact the authors of the dataset. > But as far as your bug is concerned, I am quite sure it's fixed. > > Best Regards > Andreas > > Here's the snippet with the Error from the Log: > <br>totalElapsed=3061671ms, elapsed=3061543ms, parsed=98480000, tps=32166, > done=false</br > ><br>totalElapsed=3061763ms, elapsed=3061635ms, parsed=98490000, > tps=32169, done=false</br > ><br>totalElapsed=3061855ms, elapsed=3061727ms, parsed=98500000, > tps=32171, done=false</br > ><p>ABORT</p > ><pre>Load > > source=ConstantNode(TermId(0U)[file:///srv/feed-dateien/DNBLOD/GND.ttl.gz]) > </pre > ><pre>java.lang.RuntimeException: Could not load: > url=file:///srv/feed-dateien/DNBLOD/GND.ttl.gz, > cause=org.openrdf.rio.RDFParseException: IRI included an unencoded space: > '32' [line 119517521] > at > com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertLoadGraph(AST2BOpUpdate.java:1403) > at > com.bigdata.rdf.sparql.ast.eval.AST2BOpUpdate.convertUpdateSwitch(AST2BOpUpdate.java:439) > > >>> "Andreas Kahl" <ka...@bs...> 31.03.2016 14:36 >>> > Bryan, > > Thanks for the quick response. blazegraph-jar-2.1.0-20160330.023632-2.jar > from > https://oss.sonatype.org/content/repositories/snapshots/com/blazegraph/blazegraph-jar/2.1.0-SNAPSHOT/ seems > to be working well at first sight - at least it has run past the point > where 2.0.1 failed every time. I will send you a confirmation as soon as > the whole file has run through. > > Best Regards > Andreas > > > >>> Bryan Thompson <br...@sy...> 31.03.2016 14:24 >>> > The 2.1.0-SNAPSHOT release => You can download the WAR from > https://oss.sonatype.org/content/repositories/snapshots/com/blazegraph/bigdata-war/2.1.0-SNAPSHOT/ > . > > Please let us know if this corrects the issue. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > Blazegraph > e: br...@bl... > w: http://blazegraph.com > > Blazegraph products help to solve the Graph Cache Thrash to achieve large > scale processing for graph and predictive analytics. Blazegraph is the > creator of the industry’s first GPU-accelerated high-performance database > for large graphs, has been named as one of the “10 Companies and > Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. > > > Blazegraph Database <https://www.blazegraph.com/> is our ultra-high > performance graph database that supports both RDF/SPARQL and > Tinkerpop/Blueprints APIs. Blazegraph GPU > <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS > <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new > technologies that use GPUs to enable extreme scaling that is thousands of > times faster and 40 times more affordable than CPU-based solutions. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, > disclosure, dissemination or copying of this email or its contents or > attachments is prohibited. If you have received this communication in > error, please notify the sender by reply email and permanently delete all > copies of the email and its contents and attachments. > > On Thu, Mar 31, 2016 at 8:23 AM, Bryan Thompson <br...@sy...> wrote: > >> This should be fixed in 2.1.0, which is now in QA. Brad can provide you >> with a snapshot leading up to the release for testing. >> >> Our hypothesis is that the problem arose from failing to account for the >> accumulation of blank nodes in an internal buffer. The fix takes account of >> this and then evicts batches before the buffer would overflow. The overflow >> is the index out of bounds exception. >> >> Thanks, >> Bryan >> >> ---- >> Bryan Thompson >> Chief Scientist & Founder >> Blazegraph >> e: br...@bl... >> w: http://blazegraph.com >> >> Blazegraph products help to solve the Graph Cache Thrash to achieve large >> scale processing for graph and predictive analytics. Blazegraph is the >> creator of the industry’s first GPU-accelerated high-performance database >> for large graphs, has been named as one of the “10 Companies and >> Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. >> >> >> Blazegraph Database <https://www.blazegraph.com/> is our ultra-high >> performance graph database that supports both RDF/SPARQL and >> Tinkerpop/Blueprints APIs. Blazegraph GPU >> <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS >> <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive >> new technologies that use GPUs to enable extreme scaling that is thousands >> of times faster and 40 times more affordable than CPU-based solutions. >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, >> disclosure, dissemination or copying of this email or its contents or >> attachments is prohibited. If you have received this communication in >> error, please notify the sender by reply email and permanently delete all >> copies of the email and its contents and attachments. >> >> On Thu, Mar 31, 2016 at 8:00 AM, Andreas Kahl <ka...@bs...> >> wrote: >> >>> Hello everyone, >>> >>> I did some extensive testing on this error, but I cannot load this file >>> into Blazegraph 2.0.1 (I tried the .jar from Sourceforge and compiled the >>> latest master revision with Oracle JDK 1.8.0_74 myself). >>> Blazegraph runs into this: >>> java.lang.RuntimeException: Could not load: >>> url=file:///srv/feed-dateien/DNBLOD/GND.ttl.gz, >>> cause=java.lang.ArrayIndexOutOfBoundsException: 40005 >>> >>> Please find the complete Stacktrace in the Logs attached. >>> >>> The SPARQL LOAD Command: >>> curl -d"update=LOAD <file:///srv/feed-dateien/DNBLOD/GND.ttl.gz>" >>> -d"monitor=true" http://localhost:8080/blazegraph/sparql 2>&1 >>> >>/var/log/bigdata/loadGnd.log >>> >>> The data tested can be downloaded from >>> http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.ttl.gz >>> >>> Blazegraph was running with this call: >>> nohup java -server -Xmx6g -XX:+UseG1GC >>> -Djava.io.tmpdir=/mnt/triplestore/tomcat8/temp/ -cp >>> /mnt/triplestore/bigdata/blazegraph-jar-2.0.1.jar >>> com.bigdata.rdf.sail.webapp.NanoSparqlServer 8080 kb >>> /etc/bigdata/RWStore.properties >/var/log/bigdata/blazegraph.log & >>> >>> Java-Version on the server: >>> java -version >>> java version "1.8.0_65" >>> Java(TM) SE Runtime Environment (build 1.8.0_65-b17) >>> Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) >>> >>> RWStore.properties was practically the original one (I just switched to >>> Triples Mode and altered the Journal file's location. No custom >>> Vocabularies or namespaces were used. >>> >>> Is there any known Issue? Should I try some specific revision from >>> Github? >>> >>> Thanks for any hints. >>> >>> Andreas >>> >>> >>> ------------------------------------------------------------------------------ >>> Transform Data into Opportunity. >>> Accelerate data analysis in your applications with >>> Intel Data Analytics Acceleration Library. >>> Click to learn more. >>> http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 >>> _______________________________________________ >>> Bigdata-developers mailing list >>> Big...@li... >>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>> >>> >> > |