|
From: Olaf H. <oh...@uw...> - 2016-04-09 05:02:26
|
Hi Braise, I think you can do it. Although I have not tested this use case, I do not see why it would not be possible. Just point the config.json to the journal file. Best, Olaf On April 9, 2016 12:41:37 AM GMT+02:00, "Blaise de Carné" <bde...@gm...> wrote: >Hi Olaf, > >Yes, we already took a look on your implementation. It looks good, but >we >can't use it on a journal that is already used for the SPARQL Endpoint, >am >i wrong ? > >Blaise > >Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw...> a écrit >: > >> Dear Blaise, >> >> As Michael mentioned, I implemented a TPF interface directly on top >of >> Blazegraph. This implementation uses directly the Blazegraph >internals and, >> thus, avoids the overhead of forwarding every TPF request to the >SPARQL >> endpoint interface (as would be done by using the standard TPF server >> implementation). >> >> Find the original source code here: >> >> https://github.com/hartig/BlazegraphBasedTPFServer >> >> ...and note that this TPF interface is included in the official 2.0 >> release of >> Blazegraph: >> >> >http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 >> ><http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> >> >> Cheers, >> Olaf >> >> >> >> On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: >> > In response to the request from the bigdata-commit (see below), >please >> let’s >> > resume the discussion on this place: >> > >> > Determinism is not guaranteed unless parallelism is explicitly >disabled — >> > this even holds for select queries. There are several potential >sources >> for >> > non-determinism: in the general case, Blazegraph may choose to run >> multiple >> > parallel threads for a given operator (processing different chunks >of >> data >> > in parallel), and in some cases operators also use multiple threads >> > internally. >> > >> > For the given query at hand, the single triple pattern access path >will >> > yield results in order, but this order actually might be destroyed >by >> other >> > operators on top. The projection operator, for instance, does not >> guarantee >> > order in the general case, as it might process data in different >threads. >> > The way to achieve determinism would be to explicitly disable this >> > parallelism. In fact, this is what Blazegraph is doing when >projecting >> for >> > queries that have an ORDER BY clause. Code-wise, a good starting >point is >> > in AST2BOpUtility, starting at line 579: >> > >> > <snip> >> > if (projection != null) { >> > >> > /** >> > * The projection after the ORDER BY needs to >> preserve the ordering. >> > * So does the chunked materialization >operator. >> The code above >> > * handles this for ORDER_BY + DISTINCT, but >does >> not go far enough >> > * to impose order preserving evaluation on >the >> PROJECTION and >> > * chunked materialization, both of which are >> downstream from the >> > * ORDER_BY operator. >> > * >> > * @see #1044 (PROJECTION after ORDER BY does >not >> preserve order) >> > */ >> > final boolean preserveOrder = orderBy != null; >> > >> > /* >> > * Append operator to drop variables which are not >projected >> by >> > the * subquery. >> > * >> > * Note: We need to retain all variables which were >visible >> in >> > the * parent group plus anything which was projected out of the * >> subquery. >> > Since there can be exogenous variables, the easiest way * to do >this >> > correctly is to drop variables from the subquery plan * which are >not >> > projected by the subquery. (This is not done at the * top-level >query >> plan >> > because it would cause exogenous variables * to be dropped.) >> > */ >> > >> > { >> > // The variables projected by the >subquery. >> > final IVariable<?>[] projectedVars = >> projection >> > .getProjectionVars(); >> > >> > final List<NV> anns = new >LinkedList<NV>(); >> > anns.add(new >NV(BOp.Annotations.BOP_ID, >> ctx.nextId())); >> > anns.add(new >> NV(BOp.Annotations.EVALUATION_CONTEXT, >> > BOpEvaluationContext.CONTROLLER)); anns.add(new >> > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats >anns.add(new >> > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if >(preserveOrder) { >> > /** >> > * @see #563 (ORDER BY + >DISTINCT) >> > * @see #1044 (PROJECTION >after >> ORDER BY does not preserve >> > * order) >> > */ >> > anns.add(new >> NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); >> > anns.add(new >> NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); >> > } >> > left = applyQueryHints(new >> ProjectionOp(leftOrEmpty(left),// >> > anns.toArray(new >> NV[anns.size()])// >> > ), queryBase, ctx); >> > } >> > </snip> >> > >> > If the preserve order flag is true, parallelism for the operator is >> > explicitly disabled. Disabling parallelism for the projection node >would >> > help for simple queries such as single triple pattern, but in the >general >> > case (for more complex queries) there will be other operators that >might >> > cause non-deterministic behaviour. >> > >> > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on >top of >> > Blazegraph, adding him in CC. >> > >> > >> > Best, >> > Michael >> > >> > > From: Blaise de Carné <bde...@gm...> >> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY >> > > Date: 8 April 2016 at 10:58:02 GMT+2 >> > > To: "big...@li..." >> > > <big...@li...> >> > > >> > > Hi there, >> > > >> > > I would like to expose a considiration that I find very annoying. >I >> need >> > > to do more tests but i would like to know your fellings about it. >> > > >> > > Look for this exemple : >> > > >> > > construct where { >> > > >> > > ?s <http://geovocab.org/geometry#geometry >> > > <http://geovocab.org/geometry#geometry>> ?event> >> > > } limit 5 >> > > >> > > It take avout 100ms to execute on my 3B dataset. >> > > >> > > In 90% of time, this give me 5 results in the same order : >> > > >> > > <http://linkedgeodata.org/triplify/node1003406722> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1003406722>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1003 >> > > 406722> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1003406722>> >> > > <http://linkedgeodata.org/triplify/node1003749425> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1003749425>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1003 >> > > 749425> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1003749425>> >> > > <http://linkedgeodata.org/triplify/node1011261499> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1011261499>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1011 >> > > 261499> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1011261499>> >> > > <http://linkedgeodata.org/triplify/node1011261514> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1011261514>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1011 >> > > 261514> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1011261514>> >> > > <http://linkedgeodata.org/triplify/node1011286717> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1011286717>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1011 >> > > 286717> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get >> differents >> > > results : >> > > >> > > <http://linkedgeodata.org/triplify/node1204787784> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1204787784>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1204 >> > > 787784> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1204787784>> >> > > <http://linkedgeodata.org/triplify/node1206798938> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1206798938>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1206 >> > > 798938> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1206798938>> >> > > <http://linkedgeodata.org/triplify/node12081506> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node12081506>> >> <http://geovocab.org/geometry#geom >> > > etry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1208 >> > > 1506> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node12081506>> >> > > <http://linkedgeodata.org/triplify/node1209197022> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1209197022>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1209 >> > > 197022> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1209197022>> >> > > <http://linkedgeodata.org/triplify/node1212230478> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1212230478>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1212 >> > > 230478> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1212230478>> >> > > >> > > Conclusion : order is not garantee without ORDER BY. If i use an >ORDER >> BY, >> > > performance drop alarmingly. >> > > >> > > Now take this fabulous project : Linked Data Fragments >> > > (http://linkeddatafragments.org/ ><http://linkeddatafragments.org/>), >> > > which provide a SparqlDatasource to handle data from a SPARQL >Endpoint. >> > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the >> results, >> > > as they says in the comments : >> > > >> > > // Even though the SPARQL spec indicates that >> > > // LIMIT and OFFSET might be meaningless without ORDER BY, >> > > // this doesn't seem a problem in practice. >> > > // Furthermore, sorting can be slow. Therefore, don't sort. >> > > >> > > But it's a problem in practice with Blazegraph, and i >exeperimented it >> : a >> > > Linked Data Fragments server configured over a Blazegraph SPARQL >> Endpoint >> > > serve different pages in 5-10% of time. >> > > >> > > In our project we really need to get consistent pagination, >without >> ORDER >> > > BY. Do you think that is possible with Blazegraph ? >> > > >> > > Bests, >> > > Blaise >> > > >> > > PS : i don't see this behaviour with SELECT, but cache could be >> > > responsible... >> -- Sent from my Android device with K-9 Mail. Please excuse my brevity. |