|
From: Blaise de C. <bde...@gm...> - 2016-04-08 22:42:25
|
Hi Olaf, Yes, we already took a look on your implementation. It looks good, but we can't use it on a journal that is already used for the SPARQL Endpoint, am i wrong ? Blaise Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw...> a écrit : > Dear Blaise, > > As Michael mentioned, I implemented a TPF interface directly on top of > Blazegraph. This implementation uses directly the Blazegraph internals and, > thus, avoids the overhead of forwarding every TPF request to the SPARQL > endpoint interface (as would be done by using the standard TPF server > implementation). > > Find the original source code here: > > https://github.com/hartig/BlazegraphBasedTPFServer > > ...and note that this TPF interface is included in the official 2.0 > release of > Blazegraph: > > http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 > <http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> > > Cheers, > Olaf > > > > On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: > > In response to the request from the bigdata-commit (see below), please > let’s > > resume the discussion on this place: > > > > Determinism is not guaranteed unless parallelism is explicitly disabled — > > this even holds for select queries. There are several potential sources > for > > non-determinism: in the general case, Blazegraph may choose to run > multiple > > parallel threads for a given operator (processing different chunks of > data > > in parallel), and in some cases operators also use multiple threads > > internally. > > > > For the given query at hand, the single triple pattern access path will > > yield results in order, but this order actually might be destroyed by > other > > operators on top. The projection operator, for instance, does not > guarantee > > order in the general case, as it might process data in different threads. > > The way to achieve determinism would be to explicitly disable this > > parallelism. In fact, this is what Blazegraph is doing when projecting > for > > queries that have an ORDER BY clause. Code-wise, a good starting point is > > in AST2BOpUtility, starting at line 579: > > > > <snip> > > if (projection != null) { > > > > /** > > * The projection after the ORDER BY needs to > preserve the ordering. > > * So does the chunked materialization operator. > The code above > > * handles this for ORDER_BY + DISTINCT, but does > not go far enough > > * to impose order preserving evaluation on the > PROJECTION and > > * chunked materialization, both of which are > downstream from the > > * ORDER_BY operator. > > * > > * @see #1044 (PROJECTION after ORDER BY does not > preserve order) > > */ > > final boolean preserveOrder = orderBy != null; > > > > /* > > * Append operator to drop variables which are not projected > by > > the * subquery. > > * > > * Note: We need to retain all variables which were visible > in > > the * parent group plus anything which was projected out of the * > subquery. > > Since there can be exogenous variables, the easiest way * to do this > > correctly is to drop variables from the subquery plan * which are not > > projected by the subquery. (This is not done at the * top-level query > plan > > because it would cause exogenous variables * to be dropped.) > > */ > > > > { > > // The variables projected by the subquery. > > final IVariable<?>[] projectedVars = > projection > > .getProjectionVars(); > > > > final List<NV> anns = new LinkedList<NV>(); > > anns.add(new NV(BOp.Annotations.BOP_ID, > ctx.nextId())); > > anns.add(new > NV(BOp.Annotations.EVALUATION_CONTEXT, > > BOpEvaluationContext.CONTROLLER)); anns.add(new > > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats anns.add(new > > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if (preserveOrder) { > > /** > > * @see #563 (ORDER BY + DISTINCT) > > * @see #1044 (PROJECTION after > ORDER BY does not preserve > > * order) > > */ > > anns.add(new > NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); > > anns.add(new > NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); > > } > > left = applyQueryHints(new > ProjectionOp(leftOrEmpty(left),// > > anns.toArray(new > NV[anns.size()])// > > ), queryBase, ctx); > > } > > </snip> > > > > If the preserve order flag is true, parallelism for the operator is > > explicitly disabled. Disabling parallelism for the projection node would > > help for simple queries such as single triple pattern, but in the general > > case (for more complex queries) there will be other operators that might > > cause non-deterministic behaviour. > > > > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on top of > > Blazegraph, adding him in CC. > > > > > > Best, > > Michael > > > > > From: Blaise de Carné <bde...@gm...> > > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY > > > Date: 8 April 2016 at 10:58:02 GMT+2 > > > To: "big...@li..." > > > <big...@li...> > > > > > > Hi there, > > > > > > I would like to expose a considiration that I find very annoying. I > need > > > to do more tests but i would like to know your fellings about it. > > > > > > Look for this exemple : > > > > > > construct where { > > > > > > ?s <http://geovocab.org/geometry#geometry > > > <http://geovocab.org/geometry#geometry>> ?event> > > > } limit 5 > > > > > > It take avout 100ms to execute on my 3B dataset. > > > > > > In 90% of time, this give me 5 results in the same order : > > > > > > <http://linkedgeodata.org/triplify/node1003406722> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1003406722>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1003 > > > 406722> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1003406722>> > > > <http://linkedgeodata.org/triplify/node1003749425> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1003749425>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1003 > > > 749425> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1003749425>> > > > <http://linkedgeodata.org/triplify/node1011261499> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1011261499>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1011 > > > 261499> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1011261499>> > > > <http://linkedgeodata.org/triplify/node1011261514> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1011261514>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1011 > > > 261514> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1011261514>> > > > <http://linkedgeodata.org/triplify/node1011286717> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1011286717>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1011 > > > 286717> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get > differents > > > results : > > > > > > <http://linkedgeodata.org/triplify/node1204787784> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1204787784>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1204 > > > 787784> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1204787784>> > > > <http://linkedgeodata.org/triplify/node1206798938> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1206798938>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1206 > > > 798938> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1206798938>> > > > <http://linkedgeodata.org/triplify/node12081506> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node12081506>> > <http://geovocab.org/geometry#geom > > > etry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1208 > > > 1506> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node12081506>> > > > <http://linkedgeodata.org/triplify/node1209197022> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1209197022>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1209 > > > 197022> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1209197022>> > > > <http://linkedgeodata.org/triplify/node1212230478> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1212230478>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1212 > > > 230478> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1212230478>> > > > > > > Conclusion : order is not garantee without ORDER BY. If i use an ORDER > BY, > > > performance drop alarmingly. > > > > > > Now take this fabulous project : Linked Data Fragments > > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/>), > > > which provide a SparqlDatasource to handle data from a SPARQL Endpoint. > > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the > results, > > > as they says in the comments : > > > > > > // Even though the SPARQL spec indicates that > > > // LIMIT and OFFSET might be meaningless without ORDER BY, > > > // this doesn't seem a problem in practice. > > > // Furthermore, sorting can be slow. Therefore, don't sort. > > > > > > But it's a problem in practice with Blazegraph, and i exeperimented it > : a > > > Linked Data Fragments server configured over a Blazegraph SPARQL > Endpoint > > > serve different pages in 5-10% of time. > > > > > > In our project we really need to get consistent pagination, without > ORDER > > > BY. Do you think that is possible with Blazegraph ? > > > > > > Bests, > > > Blaise > > > > > > PS : i don't see this behaviour with SELECT, but cache could be > > > responsible... > |