|
From: Olaf H. <oh...@uw...> - 2016-04-08 15:25:58
|
Dear Blaise, As Michael mentioned, I implemented a TPF interface directly on top of Blazegraph. This implementation uses directly the Blazegraph internals and, thus, avoids the overhead of forwarding every TPF request to the SPARQL endpoint interface (as would be done by using the standard TPF server implementation). Find the original source code here: https://github.com/hartig/BlazegraphBasedTPFServer ...and note that this TPF interface is included in the official 2.0 release of Blazegraph: http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 Cheers, Olaf On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: > In response to the request from the bigdata-commit (see below), please let’s > resume the discussion on this place: > > Determinism is not guaranteed unless parallelism is explicitly disabled — > this even holds for select queries. There are several potential sources for > non-determinism: in the general case, Blazegraph may choose to run multiple > parallel threads for a given operator (processing different chunks of data > in parallel), and in some cases operators also use multiple threads > internally. > > For the given query at hand, the single triple pattern access path will > yield results in order, but this order actually might be destroyed by other > operators on top. The projection operator, for instance, does not guarantee > order in the general case, as it might process data in different threads. > The way to achieve determinism would be to explicitly disable this > parallelism. In fact, this is what Blazegraph is doing when projecting for > queries that have an ORDER BY clause. Code-wise, a good starting point is > in AST2BOpUtility, starting at line 579: > > <snip> > if (projection != null) { > > /** > * The projection after the ORDER BY needs to preserve the ordering. > * So does the chunked materialization operator. The code above > * handles this for ORDER_BY + DISTINCT, but does not go far enough > * to impose order preserving evaluation on the PROJECTION and > * chunked materialization, both of which are downstream from the > * ORDER_BY operator. > * > * @see #1044 (PROJECTION after ORDER BY does not preserve order) > */ > final boolean preserveOrder = orderBy != null; > > /* > * Append operator to drop variables which are not projected by > the * subquery. > * > * Note: We need to retain all variables which were visible in > the * parent group plus anything which was projected out of the * subquery. > Since there can be exogenous variables, the easiest way * to do this > correctly is to drop variables from the subquery plan * which are not > projected by the subquery. (This is not done at the * top-level query plan > because it would cause exogenous variables * to be dropped.) > */ > > { > // The variables projected by the subquery. > final IVariable<?>[] projectedVars = projection > .getProjectionVars(); > > final List<NV> anns = new LinkedList<NV>(); > anns.add(new NV(BOp.Annotations.BOP_ID, ctx.nextId())); > anns.add(new NV(BOp.Annotations.EVALUATION_CONTEXT, > BOpEvaluationContext.CONTROLLER)); anns.add(new > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats anns.add(new > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if (preserveOrder) { > /** > * @see #563 (ORDER BY + DISTINCT) > * @see #1044 (PROJECTION after ORDER BY does not preserve > * order) > */ > anns.add(new NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); > anns.add(new NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); > } > left = applyQueryHints(new ProjectionOp(leftOrEmpty(left),// > anns.toArray(new NV[anns.size()])// > ), queryBase, ctx); > } > </snip> > > If the preserve order flag is true, parallelism for the operator is > explicitly disabled. Disabling parallelism for the projection node would > help for simple queries such as single triple pattern, but in the general > case (for more complex queries) there will be other operators that might > cause non-deterministic behaviour. > > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on top of > Blazegraph, adding him in CC. > > > Best, > Michael > > > From: Blaise de Carné <bde...@gm...> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY > > Date: 8 April 2016 at 10:58:02 GMT+2 > > To: "big...@li..." > > <big...@li...> > > > > Hi there, > > > > I would like to expose a considiration that I find very annoying. I need > > to do more tests but i would like to know your fellings about it. > > > > Look for this exemple : > > > > construct where { > > > > ?s <http://geovocab.org/geometry#geometry > > <http://geovocab.org/geometry#geometry>> ?event> > > } limit 5 > > > > It take avout 100ms to execute on my 3B dataset. > > > > In 90% of time, this give me 5 results in the same order : > > > > <http://linkedgeodata.org/triplify/node1003406722> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1003406722>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1003 > > 406722> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1003406722>> > > <http://linkedgeodata.org/triplify/node1003749425> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1003749425>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1003 > > 749425> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1003749425>> > > <http://linkedgeodata.org/triplify/node1011261499> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1011261499>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011 > > 261499> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1011261499>> > > <http://linkedgeodata.org/triplify/node1011261514> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1011261514>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011 > > 261514> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1011261514>> > > <http://linkedgeodata.org/triplify/node1011286717> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1011286717>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011 > > 286717> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get differents > > results : > > > > <http://linkedgeodata.org/triplify/node1204787784> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1204787784>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1204 > > 787784> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1204787784>> > > <http://linkedgeodata.org/triplify/node1206798938> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1206798938>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1206 > > 798938> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1206798938>> > > <http://linkedgeodata.org/triplify/node12081506> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node12081506>> <http://geovocab.org/geometry#geom > > etry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1208 > > 1506> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node12081506>> > > <http://linkedgeodata.org/triplify/node1209197022> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1209197022>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1209 > > 197022> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1209197022>> > > <http://linkedgeodata.org/triplify/node1212230478> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1212230478>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1212 > > 230478> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1212230478>> > > > > Conclusion : order is not garantee without ORDER BY. If i use an ORDER BY, > > performance drop alarmingly. > > > > Now take this fabulous project : Linked Data Fragments > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/>), > > which provide a SparqlDatasource to handle data from a SPARQL Endpoint. > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the results, > > as they says in the comments : > > > > // Even though the SPARQL spec indicates that > > // LIMIT and OFFSET might be meaningless without ORDER BY, > > // this doesn't seem a problem in practice. > > // Furthermore, sorting can be slow. Therefore, don't sort. > > > > But it's a problem in practice with Blazegraph, and i exeperimented it : a > > Linked Data Fragments server configured over a Blazegraph SPARQL Endpoint > > serve different pages in 5-10% of time. > > > > In our project we really need to get consistent pagination, without ORDER > > BY. Do you think that is possible with Blazegraph ? > > > > Bests, > > Blaise > > > > PS : i don't see this behaviour with SELECT, but cache could be > > responsible... |