|
From: Michael S. <ms...@me...> - 2016-04-08 13:36:58
|
In response to the request from the bigdata-commit (see below), please let’s resume the discussion on this place:
Determinism is not guaranteed unless parallelism is explicitly disabled — this even holds for select queries. There are several potential sources for non-determinism: in the general case, Blazegraph may choose to run multiple parallel threads for a given operator (processing different chunks of data in parallel), and in some cases operators also use multiple threads internally.
For the given query at hand, the single triple pattern access path will yield results in order, but this order actually might be destroyed by other operators on top. The projection operator, for instance, does not guarantee order in the general case, as it might process data in different threads. The way to achieve determinism would be to explicitly disable this parallelism. In fact, this is what Blazegraph is doing when projecting for queries that have an ORDER BY clause. Code-wise, a good starting point is in AST2BOpUtility, starting at line 579:
<snip>
if (projection != null) {
/**
* The projection after the ORDER BY needs to preserve the ordering.
* So does the chunked materialization operator. The code above
* handles this for ORDER_BY + DISTINCT, but does not go far enough
* to impose order preserving evaluation on the PROJECTION and
* chunked materialization, both of which are downstream from the
* ORDER_BY operator.
*
* @see #1044 (PROJECTION after ORDER BY does not preserve order)
*/
final boolean preserveOrder = orderBy != null;
/*
* Append operator to drop variables which are not projected by the
* subquery.
*
* Note: We need to retain all variables which were visible in the
* parent group plus anything which was projected out of the
* subquery. Since there can be exogenous variables, the easiest way
* to do this correctly is to drop variables from the subquery plan
* which are not projected by the subquery. (This is not done at the
* top-level query plan because it would cause exogenous variables
* to be dropped.)
*/
{
// The variables projected by the subquery.
final IVariable<?>[] projectedVars = projection
.getProjectionVars();
final List<NV> anns = new LinkedList<NV>();
anns.add(new NV(BOp.Annotations.BOP_ID, ctx.nextId()));
anns.add(new NV(BOp.Annotations.EVALUATION_CONTEXT, BOpEvaluationContext.CONTROLLER));
anns.add(new NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats
anns.add(new NV(ProjectionOp.Annotations.SELECT, projectedVars));
if (preserveOrder) {
/**
* @see #563 (ORDER BY + DISTINCT)
* @see #1044 (PROJECTION after ORDER BY does not preserve
* order)
*/
anns.add(new NV(PipelineOp.Annotations.MAX_PARALLEL, 1));
anns.add(new NV(SliceOp.Annotations.REORDER_SOLUTIONS, false));
}
left = applyQueryHints(new ProjectionOp(leftOrEmpty(left),//
anns.toArray(new NV[anns.size()])//
), queryBase, ctx);
}
</snip>
If the preserve order flag is true, parallelism for the operator is explicitly disabled. Disabling parallelism for the projection node would help for simple queries such as single triple pattern, but in the general case (for more complex queries) there will be other operators that might cause non-deterministic behaviour.
@Olaf Hartig (CC) implemented a Linked Data Fragment interface on top of Blazegraph, adding him in CC.
Best,
Michael
> From: Blaise de Carné <bde...@gm...>
> Subject: [Bigdata-commit] Pagination consistency without ORDER BY
> Date: 8 April 2016 at 10:58:02 GMT+2
> To: "big...@li..." <big...@li...>
>
> Hi there,
>
> I would like to expose a considiration that I find very annoying. I need to do more tests but i would like to know your fellings about it.
>
> Look for this exemple :
>
> construct where {
> ?s <http://geovocab.org/geometry#geometry <http://geovocab.org/geometry#geometry>> ?event
> } limit 5
>
> It take avout 100ms to execute on my 3B dataset.
>
> In 90% of time, this give me 5 results in the same order :
>
> <http://linkedgeodata.org/triplify/node1003406722> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1003406722>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1003406722> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1003406722>>
> <http://linkedgeodata.org/triplify/node1003749425> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1003749425>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1003749425> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1003749425>>
> <http://linkedgeodata.org/triplify/node1011261499> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1011261499>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011261499> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1011261499>>
> <http://linkedgeodata.org/triplify/node1011261514> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1011261514>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011261514> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1011261514>>
> <http://linkedgeodata.org/triplify/node1011286717> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1011286717>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011286717> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1011286717>>
> But sometime, i get differents results :
>
> <http://linkedgeodata.org/triplify/node1204787784> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1204787784>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1204787784> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1204787784>>
> <http://linkedgeodata.org/triplify/node1206798938> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1206798938>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1206798938> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1206798938>>
> <http://linkedgeodata.org/triplify/node12081506> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node12081506>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node12081506> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node12081506>>
> <http://linkedgeodata.org/triplify/node1209197022> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1209197022>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1209197022> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1209197022>>
> <http://linkedgeodata.org/triplify/node1212230478> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/triplify/node1212230478>> <http://geovocab.org/geometry#geometry> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://geovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1212230478> <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://linkedgeodata.org/geometry/node1212230478>>
>
> Conclusion : order is not garantee without ORDER BY. If i use an ORDER BY, performance drop alarmingly.
>
> Now take this fabulous project : Linked Data Fragments (http://linkeddatafragments.org/ <http://linkeddatafragments.org/>), which provide a SparqlDatasource to handle data from a SPARQL Endpoint. They use CONSTRUCT queries with LIMIT and OFFSET to paginate the results, as they says in the comments :
>
> // Even though the SPARQL spec indicates that
> // LIMIT and OFFSET might be meaningless without ORDER BY,
> // this doesn't seem a problem in practice.
> // Furthermore, sorting can be slow. Therefore, don't sort.
>
> But it's a problem in practice with Blazegraph, and i exeperimented it : a Linked Data Fragments server configured over a Blazegraph SPARQL Endpoint serve different pages in 5-10% of time.
>
> In our project we really need to get consistent pagination, without ORDER BY. Do you think that is possible with Blazegraph ?
>
> Bests,
> Blaise
>
> PS : i don't see this behaviour with SELECT, but cache could be responsible...
> --
> Blaise de Carné
> bde...@gm... <mailto:bde...@gm...>
> 06.73.67.28.38
> ------------------------------------------------------------------------------
> _______________________________________________
> Bigdata-commit mailing list
> Big...@li...
> https://lists.sourceforge.net/lists/listinfo/bigdata-commit
|
|
From: Olaf H. <oh...@uw...> - 2016-04-08 15:25:58
|
Dear Blaise, As Michael mentioned, I implemented a TPF interface directly on top of Blazegraph. This implementation uses directly the Blazegraph internals and, thus, avoids the overhead of forwarding every TPF request to the SPARQL endpoint interface (as would be done by using the standard TPF server implementation). Find the original source code here: https://github.com/hartig/BlazegraphBasedTPFServer ...and note that this TPF interface is included in the official 2.0 release of Blazegraph: http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 Cheers, Olaf On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: > In response to the request from the bigdata-commit (see below), please let’s > resume the discussion on this place: > > Determinism is not guaranteed unless parallelism is explicitly disabled — > this even holds for select queries. There are several potential sources for > non-determinism: in the general case, Blazegraph may choose to run multiple > parallel threads for a given operator (processing different chunks of data > in parallel), and in some cases operators also use multiple threads > internally. > > For the given query at hand, the single triple pattern access path will > yield results in order, but this order actually might be destroyed by other > operators on top. The projection operator, for instance, does not guarantee > order in the general case, as it might process data in different threads. > The way to achieve determinism would be to explicitly disable this > parallelism. In fact, this is what Blazegraph is doing when projecting for > queries that have an ORDER BY clause. Code-wise, a good starting point is > in AST2BOpUtility, starting at line 579: > > <snip> > if (projection != null) { > > /** > * The projection after the ORDER BY needs to preserve the ordering. > * So does the chunked materialization operator. The code above > * handles this for ORDER_BY + DISTINCT, but does not go far enough > * to impose order preserving evaluation on the PROJECTION and > * chunked materialization, both of which are downstream from the > * ORDER_BY operator. > * > * @see #1044 (PROJECTION after ORDER BY does not preserve order) > */ > final boolean preserveOrder = orderBy != null; > > /* > * Append operator to drop variables which are not projected by > the * subquery. > * > * Note: We need to retain all variables which were visible in > the * parent group plus anything which was projected out of the * subquery. > Since there can be exogenous variables, the easiest way * to do this > correctly is to drop variables from the subquery plan * which are not > projected by the subquery. (This is not done at the * top-level query plan > because it would cause exogenous variables * to be dropped.) > */ > > { > // The variables projected by the subquery. > final IVariable<?>[] projectedVars = projection > .getProjectionVars(); > > final List<NV> anns = new LinkedList<NV>(); > anns.add(new NV(BOp.Annotations.BOP_ID, ctx.nextId())); > anns.add(new NV(BOp.Annotations.EVALUATION_CONTEXT, > BOpEvaluationContext.CONTROLLER)); anns.add(new > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats anns.add(new > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if (preserveOrder) { > /** > * @see #563 (ORDER BY + DISTINCT) > * @see #1044 (PROJECTION after ORDER BY does not preserve > * order) > */ > anns.add(new NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); > anns.add(new NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); > } > left = applyQueryHints(new ProjectionOp(leftOrEmpty(left),// > anns.toArray(new NV[anns.size()])// > ), queryBase, ctx); > } > </snip> > > If the preserve order flag is true, parallelism for the operator is > explicitly disabled. Disabling parallelism for the projection node would > help for simple queries such as single triple pattern, but in the general > case (for more complex queries) there will be other operators that might > cause non-deterministic behaviour. > > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on top of > Blazegraph, adding him in CC. > > > Best, > Michael > > > From: Blaise de Carné <bde...@gm...> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY > > Date: 8 April 2016 at 10:58:02 GMT+2 > > To: "big...@li..." > > <big...@li...> > > > > Hi there, > > > > I would like to expose a considiration that I find very annoying. I need > > to do more tests but i would like to know your fellings about it. > > > > Look for this exemple : > > > > construct where { > > > > ?s <http://geovocab.org/geometry#geometry > > <http://geovocab.org/geometry#geometry>> ?event> > > } limit 5 > > > > It take avout 100ms to execute on my 3B dataset. > > > > In 90% of time, this give me 5 results in the same order : > > > > <http://linkedgeodata.org/triplify/node1003406722> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1003406722>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1003 > > 406722> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1003406722>> > > <http://linkedgeodata.org/triplify/node1003749425> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1003749425>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1003 > > 749425> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1003749425>> > > <http://linkedgeodata.org/triplify/node1011261499> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1011261499>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011 > > 261499> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1011261499>> > > <http://linkedgeodata.org/triplify/node1011261514> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1011261514>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011 > > 261514> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1011261514>> > > <http://linkedgeodata.org/triplify/node1011286717> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1011286717>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1011 > > 286717> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get differents > > results : > > > > <http://linkedgeodata.org/triplify/node1204787784> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1204787784>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1204 > > 787784> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1204787784>> > > <http://linkedgeodata.org/triplify/node1206798938> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1206798938>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1206 > > 798938> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1206798938>> > > <http://linkedgeodata.org/triplify/node12081506> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node12081506>> <http://geovocab.org/geometry#geom > > etry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1208 > > 1506> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node12081506>> > > <http://linkedgeodata.org/triplify/node1209197022> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1209197022>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1209 > > 197022> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1209197022>> > > <http://linkedgeodata.org/triplify/node1212230478> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/triplify/node1212230478>> <http://geovocab.org/geometry#ge > > ometry> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://ge > > ovocab.org/geometry#geometry>> <http://linkedgeodata.org/geometry/node1212 > > 230478> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:<http://li > > nkedgeodata.org/geometry/node1212230478>> > > > > Conclusion : order is not garantee without ORDER BY. If i use an ORDER BY, > > performance drop alarmingly. > > > > Now take this fabulous project : Linked Data Fragments > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/>), > > which provide a SparqlDatasource to handle data from a SPARQL Endpoint. > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the results, > > as they says in the comments : > > > > // Even though the SPARQL spec indicates that > > // LIMIT and OFFSET might be meaningless without ORDER BY, > > // this doesn't seem a problem in practice. > > // Furthermore, sorting can be slow. Therefore, don't sort. > > > > But it's a problem in practice with Blazegraph, and i exeperimented it : a > > Linked Data Fragments server configured over a Blazegraph SPARQL Endpoint > > serve different pages in 5-10% of time. > > > > In our project we really need to get consistent pagination, without ORDER > > BY. Do you think that is possible with Blazegraph ? > > > > Bests, > > Blaise > > > > PS : i don't see this behaviour with SELECT, but cache could be > > responsible... |
|
From: Blaise de C. <bde...@gm...> - 2016-04-08 22:42:25
|
Hi Olaf, Yes, we already took a look on your implementation. It looks good, but we can't use it on a journal that is already used for the SPARQL Endpoint, am i wrong ? Blaise Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw...> a écrit : > Dear Blaise, > > As Michael mentioned, I implemented a TPF interface directly on top of > Blazegraph. This implementation uses directly the Blazegraph internals and, > thus, avoids the overhead of forwarding every TPF request to the SPARQL > endpoint interface (as would be done by using the standard TPF server > implementation). > > Find the original source code here: > > https://github.com/hartig/BlazegraphBasedTPFServer > > ...and note that this TPF interface is included in the official 2.0 > release of > Blazegraph: > > http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 > <http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> > > Cheers, > Olaf > > > > On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: > > In response to the request from the bigdata-commit (see below), please > let’s > > resume the discussion on this place: > > > > Determinism is not guaranteed unless parallelism is explicitly disabled — > > this even holds for select queries. There are several potential sources > for > > non-determinism: in the general case, Blazegraph may choose to run > multiple > > parallel threads for a given operator (processing different chunks of > data > > in parallel), and in some cases operators also use multiple threads > > internally. > > > > For the given query at hand, the single triple pattern access path will > > yield results in order, but this order actually might be destroyed by > other > > operators on top. The projection operator, for instance, does not > guarantee > > order in the general case, as it might process data in different threads. > > The way to achieve determinism would be to explicitly disable this > > parallelism. In fact, this is what Blazegraph is doing when projecting > for > > queries that have an ORDER BY clause. Code-wise, a good starting point is > > in AST2BOpUtility, starting at line 579: > > > > <snip> > > if (projection != null) { > > > > /** > > * The projection after the ORDER BY needs to > preserve the ordering. > > * So does the chunked materialization operator. > The code above > > * handles this for ORDER_BY + DISTINCT, but does > not go far enough > > * to impose order preserving evaluation on the > PROJECTION and > > * chunked materialization, both of which are > downstream from the > > * ORDER_BY operator. > > * > > * @see #1044 (PROJECTION after ORDER BY does not > preserve order) > > */ > > final boolean preserveOrder = orderBy != null; > > > > /* > > * Append operator to drop variables which are not projected > by > > the * subquery. > > * > > * Note: We need to retain all variables which were visible > in > > the * parent group plus anything which was projected out of the * > subquery. > > Since there can be exogenous variables, the easiest way * to do this > > correctly is to drop variables from the subquery plan * which are not > > projected by the subquery. (This is not done at the * top-level query > plan > > because it would cause exogenous variables * to be dropped.) > > */ > > > > { > > // The variables projected by the subquery. > > final IVariable<?>[] projectedVars = > projection > > .getProjectionVars(); > > > > final List<NV> anns = new LinkedList<NV>(); > > anns.add(new NV(BOp.Annotations.BOP_ID, > ctx.nextId())); > > anns.add(new > NV(BOp.Annotations.EVALUATION_CONTEXT, > > BOpEvaluationContext.CONTROLLER)); anns.add(new > > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats anns.add(new > > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if (preserveOrder) { > > /** > > * @see #563 (ORDER BY + DISTINCT) > > * @see #1044 (PROJECTION after > ORDER BY does not preserve > > * order) > > */ > > anns.add(new > NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); > > anns.add(new > NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); > > } > > left = applyQueryHints(new > ProjectionOp(leftOrEmpty(left),// > > anns.toArray(new > NV[anns.size()])// > > ), queryBase, ctx); > > } > > </snip> > > > > If the preserve order flag is true, parallelism for the operator is > > explicitly disabled. Disabling parallelism for the projection node would > > help for simple queries such as single triple pattern, but in the general > > case (for more complex queries) there will be other operators that might > > cause non-deterministic behaviour. > > > > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on top of > > Blazegraph, adding him in CC. > > > > > > Best, > > Michael > > > > > From: Blaise de Carné <bde...@gm...> > > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY > > > Date: 8 April 2016 at 10:58:02 GMT+2 > > > To: "big...@li..." > > > <big...@li...> > > > > > > Hi there, > > > > > > I would like to expose a considiration that I find very annoying. I > need > > > to do more tests but i would like to know your fellings about it. > > > > > > Look for this exemple : > > > > > > construct where { > > > > > > ?s <http://geovocab.org/geometry#geometry > > > <http://geovocab.org/geometry#geometry>> ?event> > > > } limit 5 > > > > > > It take avout 100ms to execute on my 3B dataset. > > > > > > In 90% of time, this give me 5 results in the same order : > > > > > > <http://linkedgeodata.org/triplify/node1003406722> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1003406722>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1003 > > > 406722> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1003406722>> > > > <http://linkedgeodata.org/triplify/node1003749425> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1003749425>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1003 > > > 749425> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1003749425>> > > > <http://linkedgeodata.org/triplify/node1011261499> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1011261499>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1011 > > > 261499> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1011261499>> > > > <http://linkedgeodata.org/triplify/node1011261514> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1011261514>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1011 > > > 261514> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1011261514>> > > > <http://linkedgeodata.org/triplify/node1011286717> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1011286717>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1011 > > > 286717> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get > differents > > > results : > > > > > > <http://linkedgeodata.org/triplify/node1204787784> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1204787784>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1204 > > > 787784> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1204787784>> > > > <http://linkedgeodata.org/triplify/node1206798938> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1206798938>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1206 > > > 798938> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1206798938>> > > > <http://linkedgeodata.org/triplify/node12081506> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node12081506>> > <http://geovocab.org/geometry#geom > > > etry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1208 > > > 1506> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node12081506>> > > > <http://linkedgeodata.org/triplify/node1209197022> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1209197022>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1209 > > > 197022> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1209197022>> > > > <http://linkedgeodata.org/triplify/node1212230478> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/triplify/node1212230478>> < > http://geovocab.org/geometry#ge > > > ometry> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://ge > > > ovocab.org/geometry#geometry>> > <http://linkedgeodata.org/geometry/node1212 > > > 230478> > > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< > http://li > > > nkedgeodata.org/geometry/node1212230478>> > > > > > > Conclusion : order is not garantee without ORDER BY. If i use an ORDER > BY, > > > performance drop alarmingly. > > > > > > Now take this fabulous project : Linked Data Fragments > > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/>), > > > which provide a SparqlDatasource to handle data from a SPARQL Endpoint. > > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the > results, > > > as they says in the comments : > > > > > > // Even though the SPARQL spec indicates that > > > // LIMIT and OFFSET might be meaningless without ORDER BY, > > > // this doesn't seem a problem in practice. > > > // Furthermore, sorting can be slow. Therefore, don't sort. > > > > > > But it's a problem in practice with Blazegraph, and i exeperimented it > : a > > > Linked Data Fragments server configured over a Blazegraph SPARQL > Endpoint > > > serve different pages in 5-10% of time. > > > > > > In our project we really need to get consistent pagination, without > ORDER > > > BY. Do you think that is possible with Blazegraph ? > > > > > > Bests, > > > Blaise > > > > > > PS : i don't see this behaviour with SELECT, but cache could be > > > responsible... > |
|
From: Olaf H. <oh...@uw...> - 2016-04-09 05:02:26
|
Hi Braise, I think you can do it. Although I have not tested this use case, I do not see why it would not be possible. Just point the config.json to the journal file. Best, Olaf On April 9, 2016 12:41:37 AM GMT+02:00, "Blaise de Carné" <bde...@gm...> wrote: >Hi Olaf, > >Yes, we already took a look on your implementation. It looks good, but >we >can't use it on a journal that is already used for the SPARQL Endpoint, >am >i wrong ? > >Blaise > >Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw...> a écrit >: > >> Dear Blaise, >> >> As Michael mentioned, I implemented a TPF interface directly on top >of >> Blazegraph. This implementation uses directly the Blazegraph >internals and, >> thus, avoids the overhead of forwarding every TPF request to the >SPARQL >> endpoint interface (as would be done by using the standard TPF server >> implementation). >> >> Find the original source code here: >> >> https://github.com/hartig/BlazegraphBasedTPFServer >> >> ...and note that this TPF interface is included in the official 2.0 >> release of >> Blazegraph: >> >> >http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 >> ><http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> >> >> Cheers, >> Olaf >> >> >> >> On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: >> > In response to the request from the bigdata-commit (see below), >please >> let’s >> > resume the discussion on this place: >> > >> > Determinism is not guaranteed unless parallelism is explicitly >disabled — >> > this even holds for select queries. There are several potential >sources >> for >> > non-determinism: in the general case, Blazegraph may choose to run >> multiple >> > parallel threads for a given operator (processing different chunks >of >> data >> > in parallel), and in some cases operators also use multiple threads >> > internally. >> > >> > For the given query at hand, the single triple pattern access path >will >> > yield results in order, but this order actually might be destroyed >by >> other >> > operators on top. The projection operator, for instance, does not >> guarantee >> > order in the general case, as it might process data in different >threads. >> > The way to achieve determinism would be to explicitly disable this >> > parallelism. In fact, this is what Blazegraph is doing when >projecting >> for >> > queries that have an ORDER BY clause. Code-wise, a good starting >point is >> > in AST2BOpUtility, starting at line 579: >> > >> > <snip> >> > if (projection != null) { >> > >> > /** >> > * The projection after the ORDER BY needs to >> preserve the ordering. >> > * So does the chunked materialization >operator. >> The code above >> > * handles this for ORDER_BY + DISTINCT, but >does >> not go far enough >> > * to impose order preserving evaluation on >the >> PROJECTION and >> > * chunked materialization, both of which are >> downstream from the >> > * ORDER_BY operator. >> > * >> > * @see #1044 (PROJECTION after ORDER BY does >not >> preserve order) >> > */ >> > final boolean preserveOrder = orderBy != null; >> > >> > /* >> > * Append operator to drop variables which are not >projected >> by >> > the * subquery. >> > * >> > * Note: We need to retain all variables which were >visible >> in >> > the * parent group plus anything which was projected out of the * >> subquery. >> > Since there can be exogenous variables, the easiest way * to do >this >> > correctly is to drop variables from the subquery plan * which are >not >> > projected by the subquery. (This is not done at the * top-level >query >> plan >> > because it would cause exogenous variables * to be dropped.) >> > */ >> > >> > { >> > // The variables projected by the >subquery. >> > final IVariable<?>[] projectedVars = >> projection >> > .getProjectionVars(); >> > >> > final List<NV> anns = new >LinkedList<NV>(); >> > anns.add(new >NV(BOp.Annotations.BOP_ID, >> ctx.nextId())); >> > anns.add(new >> NV(BOp.Annotations.EVALUATION_CONTEXT, >> > BOpEvaluationContext.CONTROLLER)); anns.add(new >> > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats >anns.add(new >> > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if >(preserveOrder) { >> > /** >> > * @see #563 (ORDER BY + >DISTINCT) >> > * @see #1044 (PROJECTION >after >> ORDER BY does not preserve >> > * order) >> > */ >> > anns.add(new >> NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); >> > anns.add(new >> NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); >> > } >> > left = applyQueryHints(new >> ProjectionOp(leftOrEmpty(left),// >> > anns.toArray(new >> NV[anns.size()])// >> > ), queryBase, ctx); >> > } >> > </snip> >> > >> > If the preserve order flag is true, parallelism for the operator is >> > explicitly disabled. Disabling parallelism for the projection node >would >> > help for simple queries such as single triple pattern, but in the >general >> > case (for more complex queries) there will be other operators that >might >> > cause non-deterministic behaviour. >> > >> > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on >top of >> > Blazegraph, adding him in CC. >> > >> > >> > Best, >> > Michael >> > >> > > From: Blaise de Carné <bde...@gm...> >> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY >> > > Date: 8 April 2016 at 10:58:02 GMT+2 >> > > To: "big...@li..." >> > > <big...@li...> >> > > >> > > Hi there, >> > > >> > > I would like to expose a considiration that I find very annoying. >I >> need >> > > to do more tests but i would like to know your fellings about it. >> > > >> > > Look for this exemple : >> > > >> > > construct where { >> > > >> > > ?s <http://geovocab.org/geometry#geometry >> > > <http://geovocab.org/geometry#geometry>> ?event> >> > > } limit 5 >> > > >> > > It take avout 100ms to execute on my 3B dataset. >> > > >> > > In 90% of time, this give me 5 results in the same order : >> > > >> > > <http://linkedgeodata.org/triplify/node1003406722> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1003406722>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1003 >> > > 406722> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1003406722>> >> > > <http://linkedgeodata.org/triplify/node1003749425> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1003749425>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1003 >> > > 749425> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1003749425>> >> > > <http://linkedgeodata.org/triplify/node1011261499> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1011261499>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1011 >> > > 261499> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1011261499>> >> > > <http://linkedgeodata.org/triplify/node1011261514> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1011261514>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1011 >> > > 261514> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1011261514>> >> > > <http://linkedgeodata.org/triplify/node1011286717> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1011286717>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1011 >> > > 286717> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get >> differents >> > > results : >> > > >> > > <http://linkedgeodata.org/triplify/node1204787784> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1204787784>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1204 >> > > 787784> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1204787784>> >> > > <http://linkedgeodata.org/triplify/node1206798938> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1206798938>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1206 >> > > 798938> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1206798938>> >> > > <http://linkedgeodata.org/triplify/node12081506> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node12081506>> >> <http://geovocab.org/geometry#geom >> > > etry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1208 >> > > 1506> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node12081506>> >> > > <http://linkedgeodata.org/triplify/node1209197022> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1209197022>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1209 >> > > 197022> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1209197022>> >> > > <http://linkedgeodata.org/triplify/node1212230478> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/triplify/node1212230478>> < >> http://geovocab.org/geometry#ge >> > > ometry> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://ge >> > > ovocab.org/geometry#geometry>> >> <http://linkedgeodata.org/geometry/node1212 >> > > 230478> >> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >> http://li >> > > nkedgeodata.org/geometry/node1212230478>> >> > > >> > > Conclusion : order is not garantee without ORDER BY. If i use an >ORDER >> BY, >> > > performance drop alarmingly. >> > > >> > > Now take this fabulous project : Linked Data Fragments >> > > (http://linkeddatafragments.org/ ><http://linkeddatafragments.org/>), >> > > which provide a SparqlDatasource to handle data from a SPARQL >Endpoint. >> > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the >> results, >> > > as they says in the comments : >> > > >> > > // Even though the SPARQL spec indicates that >> > > // LIMIT and OFFSET might be meaningless without ORDER BY, >> > > // this doesn't seem a problem in practice. >> > > // Furthermore, sorting can be slow. Therefore, don't sort. >> > > >> > > But it's a problem in practice with Blazegraph, and i >exeperimented it >> : a >> > > Linked Data Fragments server configured over a Blazegraph SPARQL >> Endpoint >> > > serve different pages in 5-10% of time. >> > > >> > > In our project we really need to get consistent pagination, >without >> ORDER >> > > BY. Do you think that is possible with Blazegraph ? >> > > >> > > Bests, >> > > Blaise >> > > >> > > PS : i don't see this behaviour with SELECT, but cache could be >> > > responsible... >> -- Sent from my Android device with K-9 Mail. Please excuse my brevity. |
|
From: Bryan T. <br...@sy...> - 2016-04-10 15:07:30
|
Blaise, Please confirm that you can simply reconfigure to access an existing Journal file. This should work. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder Blazegraph e: br...@bl... w: http://blazegraph.com Blazegraph products help to solve the Graph Cache Thrash to achieve large scale processing for graph and predictive analytics. Blazegraph is the creator of the industry’s first GPU-accelerated high-performance database for large graphs, has been named as one of the “10 Companies and Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. Blazegraph Database <https://www.blazegraph.com/> is our ultra-high performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Blazegraph GPU <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new technologies that use GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Sat, Apr 9, 2016 at 1:02 AM, Olaf Hartig <oh...@uw...> wrote: > Hi Braise, > > I think you can do it. Although I have not tested this use case, I do not > see why it would not be possible. Just point the config.json to the journal > file. > > Best, > Olaf > > > On April 9, 2016 12:41:37 AM GMT+02:00, "Blaise de Carné" < > bde...@gm...> wrote: >> >> Hi Olaf, >> >> Yes, we already took a look on your implementation. It looks good, but we >> can't use it on a journal that is already used for the SPARQL Endpoint, am >> i wrong ? >> >> Blaise >> >> Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw...> a écrit : >> >>> Dear Blaise, >>> >>> As Michael mentioned, I implemented a TPF interface directly on top of >>> Blazegraph. This implementation uses directly the Blazegraph internals >>> and, >>> thus, avoids the overhead of forwarding every TPF request to the SPARQL >>> endpoint interface (as would be done by using the standard TPF server >>> implementation). >>> >>> Find the original source code here: >>> >>> https://github.com/hartig/BlazegraphBasedTPFServer >>> >>> ...and note that this TPF interface is included in the official 2.0 >>> release of >>> Blazegraph: >>> >>> http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 >>> <http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> >>> >>> Cheers, >>> Olaf >>> >>> >>> >>> On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: >>> > In response to the request from the bigdata-commit (see below), please >>> let’s >>> > resume the discussion on this place: >>> > >>> > Determinism is not guaranteed unless parallelism is explicitly >>> disabled — >>> > this even holds for select queries. There are several potential >>> sources for >>> > non-determinism: in the general case, Blazegraph may choose to run >>> multiple >>> > parallel threads for a given operator (processing different chunks of >>> data >>> > in parallel), and in some cases operators also use multiple threads >>> > internally. >>> > >>> > For the given query at hand, the single triple pattern access path will >>> > yield results in order, but this order actually might be destroyed by >>> other >>> > operators on top. The projection operator, for instance, does not >>> guarantee >>> > order in the general case, as it might process data in different >>> threads. >>> > The way to achieve determinism would be to explicitly disable this >>> > parallelism. In fact, this is what Blazegraph is doing when projecting >>> for >>> > queries that have an ORDER BY clause. Code-wise, a good starting point >>> is >>> > in AST2BOpUtility, starting at line 579: >>> > >>> > <snip> >>> > if (projection != null) { >>> > >>> > /** >>> > * The projection after the ORDER BY needs to >>> preserve the ordering. >>> > * So does the chunked materialization operator. >>> The code above >>> > * handles this for ORDER_BY + DISTINCT, but >>> does not go far enough >>> > * to impose order preserving evaluation on the >>> PROJECTION and >>> > * chunked materialization, both of which are >>> downstream from the >>> > * ORDER_BY operator. >>> > * >>> > * @see #1044 (PROJECTION after ORDER BY does >>> not preserve order) >>> > */ >>> > final boolean preserveOrder = orderBy != null; >>> > >>> > /* >>> > * Append operator to drop variables which are not >>> projected by >>> > the * subquery. >>> > * >>> > * Note: We need to retain all variables which were >>> visible in >>> > the * parent group plus anything which was projected out of the * >>> subquery. >>> > Since there can be exogenous variables, the easiest way * to do this >>> > correctly is to drop variables from the subquery plan * which are not >>> > projected by the subquery. (This is not done at the * top-level query >>> plan >>> > because it would cause exogenous variables * to be dropped.) >>> > */ >>> > >>> > { >>> > // The variables projected by the >>> subquery. >>> > final IVariable<?>[] projectedVars = >>> projection >>> > .getProjectionVars(); >>> > >>> > final List<NV> anns = new >>> LinkedList<NV>(); >>> > anns.add(new NV(BOp.Annotations.BOP_ID, >>> ctx.nextId())); >>> > anns.add(new >>> NV(BOp.Annotations.EVALUATION_CONTEXT, >>> > BOpEvaluationContext.CONTROLLER)); anns.add(new >>> > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats >>> anns.add(new >>> > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if >>> (preserveOrder) { >>> > /** >>> > * @see #563 (ORDER BY + >>> DISTINCT) >>> > * @see #1044 (PROJECTION after >>> ORDER BY does not preserve >>> > * order) >>> > */ >>> > anns.add(new >>> NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); >>> > anns.add(new >>> NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); >>> > } >>> > left = applyQueryHints(new >>> ProjectionOp(leftOrEmpty(left),// >>> > anns.toArray(new >>> NV[anns.size()])// >>> > ), queryBase, ctx); >>> > } >>> > </snip> >>> > >>> > If the preserve order flag is true, parallelism for the operator is >>> > explicitly disabled. Disabling parallelism for the projection node >>> would >>> > help for simple queries such as single triple pattern, but in the >>> general >>> > case (for more complex queries) there will be other operators that >>> might >>> > cause non-deterministic behaviour. >>> > >>> > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on top >>> of >>> > Blazegraph, adding him in CC. >>> > >>> > >>> > Best, >>> > Michael >>> > >>> > > From: Blaise de Carné <bde...@gm...> >>> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY >>> > > Date: 8 April 2016 at 10:58:02 GMT+2 >>> > > To: "big...@li..." >>> > > <big...@li...> >>> > > >>> > > Hi there, >>> > > >>> > > I would like to expose a considiration that I find very annoying. I >>> need >>> > > to do more tests but i would like to know your fellings about it. >>> > > >>> > > Look for this exemple : >>> > > >>> > > construct where { >>> > > >>> > > ?s <http://geovocab.org/geometry#geometry >>> > > <http://geovocab.org/geometry#geometry>> ?event> >>> > > } limit 5 >>> > > >>> > > It take avout 100ms to execute on my 3B dataset. >>> > > >>> > > In 90% of time, this give me 5 results in the same order : >>> > > >>> > > <http://linkedgeodata.org/triplify/node1003406722> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1003406722>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1003 >>> > > 406722> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1003406722>> >>> > > <http://linkedgeodata.org/triplify/node1003749425> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1003749425>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1003 >>> > > 749425> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1003749425>> >>> > > <http://linkedgeodata.org/triplify/node1011261499> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1011261499>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1011 >>> > > 261499> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1011261499>> >>> > > <http://linkedgeodata.org/triplify/node1011261514> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1011261514>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1011 >>> > > 261514> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1011261514>> >>> > > <http://linkedgeodata.org/triplify/node1011286717> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1011286717>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1011 >>> > > 286717> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get >>> differents >>> > > results : >>> > > >>> > > <http://linkedgeodata.org/triplify/node1204787784> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1204787784>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1204 >>> > > 787784> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1204787784>> >>> > > <http://linkedgeodata.org/triplify/node1206798938> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1206798938>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1206 >>> > > 798938> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1206798938>> >>> > > <http://linkedgeodata.org/triplify/node12081506> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node12081506>> >>> <http://geovocab.org/geometry#geom >>> > > etry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1208 >>> > > 1506> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node12081506>> >>> > > <http://linkedgeodata.org/triplify/node1209197022> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1209197022>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1209 >>> > > 197022> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1209197022>> >>> > > <http://linkedgeodata.org/triplify/node1212230478> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/triplify/node1212230478>> < >>> http://geovocab.org/geometry#ge >>> > > ometry> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://ge >>> > > ovocab.org/geometry#geometry>> >>> <http://linkedgeodata.org/geometry/node1212 >>> > > 230478> >>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>> http://li >>> > > nkedgeodata.org/geometry/node1212230478>> >>> > > >>> > > Conclusion : order is not garantee without ORDER BY. If i use an >>> ORDER BY, >>> > > performance drop alarmingly. >>> > > >>> > > Now take this fabulous project : Linked Data Fragments >>> > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/>), >>> > > which provide a SparqlDatasource to handle data from a SPARQL >>> Endpoint. >>> > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the >>> results, >>> > > as they says in the comments : >>> > > >>> > > // Even though the SPARQL spec indicates that >>> > > // LIMIT and OFFSET might be meaningless without ORDER BY, >>> > > // this doesn't seem a problem in practice. >>> > > // Furthermore, sorting can be slow. Therefore, don't sort. >>> > > >>> > > But it's a problem in practice with Blazegraph, and i exeperimented >>> it : a >>> > > Linked Data Fragments server configured over a Blazegraph SPARQL >>> Endpoint >>> > > serve different pages in 5-10% of time. >>> > > >>> > > In our project we really need to get consistent pagination, without >>> ORDER >>> > > BY. Do you think that is possible with Blazegraph ? >>> > > >>> > > Bests, >>> > > Blaise >>> > > >>> > > PS : i don't see this behaviour with SELECT, but cache could be >>> > > responsible... >>> >> > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications > Manager > Applications Manager provides deep performance insights into multiple > tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/ > gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532 > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > > |
|
From: Blaise de C. <bde...@gm...> - 2016-04-11 09:21:59
|
Hi there, It works when the Blazegraph server is not running. We can't get it work when the NanoSparqlServer is running, we get this error : org.eclipse.jetty.servlet.ServletHolder$1: org.linkeddatafragments.exceptions.DataSourceCreationException: java.lang.RuntimeException: file=blazegraph.jnl Best, Blaise 2016-04-10 17:07 GMT+02:00 Bryan Thompson <br...@sy...>: > Blaise, > > Please confirm that you can simply reconfigure to access an existing > Journal file. This should work. > > Thanks, > Bryan > > ---- > Bryan Thompson > Chief Scientist & Founder > Blazegraph > e: br...@bl... > w: http://blazegraph.com > > Blazegraph products help to solve the Graph Cache Thrash to achieve large > scale processing for graph and predictive analytics. Blazegraph is the > creator of the industry’s first GPU-accelerated high-performance database > for large graphs, has been named as one of the “10 Companies and > Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. > > > Blazegraph Database <https://www.blazegraph.com/> is our ultra-high > performance graph database that supports both RDF/SPARQL and > Tinkerpop/Blueprints APIs. Blazegraph GPU > <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS > <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new > technologies that use GPUs to enable extreme scaling that is thousands of > times faster and 40 times more affordable than CPU-based solutions. > > CONFIDENTIALITY NOTICE: This email and its contents and attachments are > for the sole use of the intended recipient(s) and are confidential or > proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, > disclosure, dissemination or copying of this email or its contents or > attachments is prohibited. If you have received this communication in > error, please notify the sender by reply email and permanently delete all > copies of the email and its contents and attachments. > > On Sat, Apr 9, 2016 at 1:02 AM, Olaf Hartig <oh...@uw...> wrote: > >> Hi Braise, >> >> I think you can do it. Although I have not tested this use case, I do not >> see why it would not be possible. Just point the config.json to the journal >> file. >> >> Best, >> Olaf >> >> >> On April 9, 2016 12:41:37 AM GMT+02:00, "Blaise de Carné" < >> bde...@gm...> wrote: >>> >>> Hi Olaf, >>> >>> Yes, we already took a look on your implementation. It looks good, but >>> we can't use it on a journal that is already used for the SPARQL Endpoint, >>> am i wrong ? >>> >>> Blaise >>> >>> Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw...> a >>> écrit : >>> >>>> Dear Blaise, >>>> >>>> As Michael mentioned, I implemented a TPF interface directly on top of >>>> Blazegraph. This implementation uses directly the Blazegraph internals >>>> and, >>>> thus, avoids the overhead of forwarding every TPF request to the SPARQL >>>> endpoint interface (as would be done by using the standard TPF server >>>> implementation). >>>> >>>> Find the original source code here: >>>> >>>> https://github.com/hartig/BlazegraphBasedTPFServer >>>> >>>> ...and note that this TPF interface is included in the official 2.0 >>>> release of >>>> Blazegraph: >>>> >>>> http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 >>>> <http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> >>>> >>>> Cheers, >>>> Olaf >>>> >>>> >>>> >>>> On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: >>>> > In response to the request from the bigdata-commit (see below), >>>> please let’s >>>> > resume the discussion on this place: >>>> > >>>> > Determinism is not guaranteed unless parallelism is explicitly >>>> disabled — >>>> > this even holds for select queries. There are several potential >>>> sources for >>>> > non-determinism: in the general case, Blazegraph may choose to run >>>> multiple >>>> > parallel threads for a given operator (processing different chunks of >>>> data >>>> > in parallel), and in some cases operators also use multiple threads >>>> > internally. >>>> > >>>> > For the given query at hand, the single triple pattern access path >>>> will >>>> > yield results in order, but this order actually might be destroyed by >>>> other >>>> > operators on top. The projection operator, for instance, does not >>>> guarantee >>>> > order in the general case, as it might process data in different >>>> threads. >>>> > The way to achieve determinism would be to explicitly disable this >>>> > parallelism. In fact, this is what Blazegraph is doing when >>>> projecting for >>>> > queries that have an ORDER BY clause. Code-wise, a good starting >>>> point is >>>> > in AST2BOpUtility, starting at line 579: >>>> > >>>> > <snip> >>>> > if (projection != null) { >>>> > >>>> > /** >>>> > * The projection after the ORDER BY needs to >>>> preserve the ordering. >>>> > * So does the chunked materialization >>>> operator. The code above >>>> > * handles this for ORDER_BY + DISTINCT, but >>>> does not go far enough >>>> > * to impose order preserving evaluation on the >>>> PROJECTION and >>>> > * chunked materialization, both of which are >>>> downstream from the >>>> > * ORDER_BY operator. >>>> > * >>>> > * @see #1044 (PROJECTION after ORDER BY does >>>> not preserve order) >>>> > */ >>>> > final boolean preserveOrder = orderBy != null; >>>> > >>>> > /* >>>> > * Append operator to drop variables which are not >>>> projected by >>>> > the * subquery. >>>> > * >>>> > * Note: We need to retain all variables which were >>>> visible in >>>> > the * parent group plus anything which was projected out of the * >>>> subquery. >>>> > Since there can be exogenous variables, the easiest way * to do this >>>> > correctly is to drop variables from the subquery plan * which are not >>>> > projected by the subquery. (This is not done at the * top-level query >>>> plan >>>> > because it would cause exogenous variables * to be dropped.) >>>> > */ >>>> > >>>> > { >>>> > // The variables projected by the >>>> subquery. >>>> > final IVariable<?>[] projectedVars = >>>> projection >>>> > .getProjectionVars(); >>>> > >>>> > final List<NV> anns = new >>>> LinkedList<NV>(); >>>> > anns.add(new NV(BOp.Annotations.BOP_ID, >>>> ctx.nextId())); >>>> > anns.add(new >>>> NV(BOp.Annotations.EVALUATION_CONTEXT, >>>> > BOpEvaluationContext.CONTROLLER)); anns.add(new >>>> > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats >>>> anns.add(new >>>> > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if >>>> (preserveOrder) { >>>> > /** >>>> > * @see #563 (ORDER BY + >>>> DISTINCT) >>>> > * @see #1044 (PROJECTION after >>>> ORDER BY does not preserve >>>> > * order) >>>> > */ >>>> > anns.add(new >>>> NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); >>>> > anns.add(new >>>> NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); >>>> > } >>>> > left = applyQueryHints(new >>>> ProjectionOp(leftOrEmpty(left),// >>>> > anns.toArray(new >>>> NV[anns.size()])// >>>> > ), queryBase, ctx); >>>> > } >>>> > </snip> >>>> > >>>> > If the preserve order flag is true, parallelism for the operator is >>>> > explicitly disabled. Disabling parallelism for the projection node >>>> would >>>> > help for simple queries such as single triple pattern, but in the >>>> general >>>> > case (for more complex queries) there will be other operators that >>>> might >>>> > cause non-deterministic behaviour. >>>> > >>>> > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on top >>>> of >>>> > Blazegraph, adding him in CC. >>>> > >>>> > >>>> > Best, >>>> > Michael >>>> > >>>> > > From: Blaise de Carné <bde...@gm...> >>>> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY >>>> > > Date: 8 April 2016 at 10:58:02 GMT+2 >>>> > > To: "big...@li..." >>>> > > <big...@li...> >>>> > > >>>> > > Hi there, >>>> > > >>>> > > I would like to expose a considiration that I find very annoying. I >>>> need >>>> > > to do more tests but i would like to know your fellings about it. >>>> > > >>>> > > Look for this exemple : >>>> > > >>>> > > construct where { >>>> > > >>>> > > ?s <http://geovocab.org/geometry#geometry >>>> > > <http://geovocab.org/geometry#geometry>> ?event> >>>> > > } limit 5 >>>> > > >>>> > > It take avout 100ms to execute on my 3B dataset. >>>> > > >>>> > > In 90% of time, this give me 5 results in the same order : >>>> > > >>>> > > <http://linkedgeodata.org/triplify/node1003406722> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1003406722>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1003 >>>> > > 406722> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1003406722>> >>>> > > <http://linkedgeodata.org/triplify/node1003749425> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1003749425>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1003 >>>> > > 749425> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1003749425>> >>>> > > <http://linkedgeodata.org/triplify/node1011261499> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1011261499>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1011 >>>> > > 261499> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1011261499>> >>>> > > <http://linkedgeodata.org/triplify/node1011261514> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1011261514>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1011 >>>> > > 261514> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1011261514>> >>>> > > <http://linkedgeodata.org/triplify/node1011286717> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1011286717>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1011 >>>> > > 286717> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get >>>> differents >>>> > > results : >>>> > > >>>> > > <http://linkedgeodata.org/triplify/node1204787784> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1204787784>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1204 >>>> > > 787784> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1204787784>> >>>> > > <http://linkedgeodata.org/triplify/node1206798938> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1206798938>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1206 >>>> > > 798938> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1206798938>> >>>> > > <http://linkedgeodata.org/triplify/node12081506> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node12081506>> >>>> <http://geovocab.org/geometry#geom >>>> > > etry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1208 >>>> > > 1506> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node12081506>> >>>> > > <http://linkedgeodata.org/triplify/node1209197022> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1209197022>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1209 >>>> > > 197022> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1209197022>> >>>> > > <http://linkedgeodata.org/triplify/node1212230478> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/triplify/node1212230478>> < >>>> http://geovocab.org/geometry#ge >>>> > > ometry> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://ge >>>> > > ovocab.org/geometry#geometry>> >>>> <http://linkedgeodata.org/geometry/node1212 >>>> > > 230478> >>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>> http://li >>>> > > nkedgeodata.org/geometry/node1212230478>> >>>> > > >>>> > > Conclusion : order is not garantee without ORDER BY. If i use an >>>> ORDER BY, >>>> > > performance drop alarmingly. >>>> > > >>>> > > Now take this fabulous project : Linked Data Fragments >>>> > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/ >>>> >), >>>> > > which provide a SparqlDatasource to handle data from a SPARQL >>>> Endpoint. >>>> > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the >>>> results, >>>> > > as they says in the comments : >>>> > > >>>> > > // Even though the SPARQL spec indicates that >>>> > > // LIMIT and OFFSET might be meaningless without ORDER BY, >>>> > > // this doesn't seem a problem in practice. >>>> > > // Furthermore, sorting can be slow. Therefore, don't sort. >>>> > > >>>> > > But it's a problem in practice with Blazegraph, and i exeperimented >>>> it : a >>>> > > Linked Data Fragments server configured over a Blazegraph SPARQL >>>> Endpoint >>>> > > serve different pages in 5-10% of time. >>>> > > >>>> > > In our project we really need to get consistent pagination, without >>>> ORDER >>>> > > BY. Do you think that is possible with Blazegraph ? >>>> > > >>>> > > Bests, >>>> > > Blaise >>>> > > >>>> > > PS : i don't see this behaviour with SELECT, but cache could be >>>> > > responsible... >>>> >>> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity. >> >> >> ------------------------------------------------------------------------------ >> Find and fix application performance issues faster with Applications >> Manager >> Applications Manager provides deep performance insights into multiple >> tiers of >> your business applications. It resolves application problems quickly and >> reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/ >> gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532 >> <http://pubads.g.doubleclick.net/gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532> >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >> > |
|
From: Bryan T. <br...@sy...> - 2016-04-11 10:36:53
|
This is because the journal can not be opened from two separate processes at the same time. I can discuss this with Olaf. Thanks, Bryan On Monday, April 11, 2016, Blaise de Carné <bde...@gm...> wrote: > Hi there, > > It works when the Blazegraph server is not running. We can't get it work > when the NanoSparqlServer is running, we get this error : > > org.eclipse.jetty.servlet.ServletHolder$1: org.linkeddatafragments.exceptions.DataSourceCreationException: java.lang.RuntimeException: file=blazegraph.jnl > > Best, > Blaise > > 2016-04-10 17:07 GMT+02:00 Bryan Thompson <br...@sy... > <javascript:_e(%7B%7D,'cvml','br...@sy...');>>: > >> Blaise, >> >> Please confirm that you can simply reconfigure to access an existing >> Journal file. This should work. >> >> Thanks, >> Bryan >> >> ---- >> Bryan Thompson >> Chief Scientist & Founder >> Blazegraph >> e: br...@bl... >> <javascript:_e(%7B%7D,'cvml','br...@bl...');> >> w: http://blazegraph.com >> >> Blazegraph products help to solve the Graph Cache Thrash to achieve large >> scale processing for graph and predictive analytics. Blazegraph is the >> creator of the industry’s first GPU-accelerated high-performance database >> for large graphs, has been named as one of the “10 Companies and >> Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. >> >> >> Blazegraph Database <https://www.blazegraph.com/> is our ultra-high >> performance graph database that supports both RDF/SPARQL and >> Tinkerpop/Blueprints APIs. Blazegraph GPU >> <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS >> <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive >> new technologies that use GPUs to enable extreme scaling that is thousands >> of times faster and 40 times more affordable than CPU-based solutions. >> >> CONFIDENTIALITY NOTICE: This email and its contents and attachments are >> for the sole use of the intended recipient(s) and are confidential or >> proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, >> disclosure, dissemination or copying of this email or its contents or >> attachments is prohibited. If you have received this communication in >> error, please notify the sender by reply email and permanently delete all >> copies of the email and its contents and attachments. >> >> On Sat, Apr 9, 2016 at 1:02 AM, Olaf Hartig <oh...@uw... >> <javascript:_e(%7B%7D,'cvml','oh...@uw...');>> wrote: >> >>> Hi Braise, >>> >>> I think you can do it. Although I have not tested this use case, I do >>> not see why it would not be possible. Just point the config.json to the >>> journal file. >>> >>> Best, >>> Olaf >>> >>> >>> On April 9, 2016 12:41:37 AM GMT+02:00, "Blaise de Carné" < >>> bde...@gm... <javascript:_e(%7B%7D,'cvml','bde...@gm...');>> >>> wrote: >>>> >>>> Hi Olaf, >>>> >>>> Yes, we already took a look on your implementation. It looks good, but >>>> we can't use it on a journal that is already used for the SPARQL Endpoint, >>>> am i wrong ? >>>> >>>> Blaise >>>> >>>> Le ven. 8 avr. 2016 à 16:20, Olaf Hartig <oh...@uw... >>>> <javascript:_e(%7B%7D,'cvml','oh...@uw...');>> a écrit : >>>> >>>>> Dear Blaise, >>>>> >>>>> As Michael mentioned, I implemented a TPF interface directly on top of >>>>> Blazegraph. This implementation uses directly the Blazegraph internals >>>>> and, >>>>> thus, avoids the overhead of forwarding every TPF request to the SPARQL >>>>> endpoint interface (as would be done by using the standard TPF server >>>>> implementation). >>>>> >>>>> Find the original source code here: >>>>> >>>>> https://github.com/hartig/BlazegraphBasedTPFServer >>>>> >>>>> ...and note that this TPF interface is included in the official 2.0 >>>>> release of >>>>> Blazegraph: >>>>> >>>>> http://search.maven.org/#search|ga|1|a%3A%22BlazegraphBasedTPFServer%22 >>>>> <http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22BlazegraphBasedTPFServer%22> >>>>> >>>>> Cheers, >>>>> Olaf >>>>> >>>>> >>>>> >>>>> On Friday 08 April 2016 15:36:48 Michael Schmidt wrote: >>>>> > In response to the request from the bigdata-commit (see below), >>>>> please let’s >>>>> > resume the discussion on this place: >>>>> > >>>>> > Determinism is not guaranteed unless parallelism is explicitly >>>>> disabled — >>>>> > this even holds for select queries. There are several potential >>>>> sources for >>>>> > non-determinism: in the general case, Blazegraph may choose to run >>>>> multiple >>>>> > parallel threads for a given operator (processing different chunks >>>>> of data >>>>> > in parallel), and in some cases operators also use multiple threads >>>>> > internally. >>>>> > >>>>> > For the given query at hand, the single triple pattern access path >>>>> will >>>>> > yield results in order, but this order actually might be destroyed >>>>> by other >>>>> > operators on top. The projection operator, for instance, does not >>>>> guarantee >>>>> > order in the general case, as it might process data in different >>>>> threads. >>>>> > The way to achieve determinism would be to explicitly disable this >>>>> > parallelism. In fact, this is what Blazegraph is doing when >>>>> projecting for >>>>> > queries that have an ORDER BY clause. Code-wise, a good starting >>>>> point is >>>>> > in AST2BOpUtility, starting at line 579: >>>>> > >>>>> > <snip> >>>>> > if (projection != null) { >>>>> > >>>>> > /** >>>>> > * The projection after the ORDER BY needs to >>>>> preserve the ordering. >>>>> > * So does the chunked materialization >>>>> operator. The code above >>>>> > * handles this for ORDER_BY + DISTINCT, but >>>>> does not go far enough >>>>> > * to impose order preserving evaluation on >>>>> the PROJECTION and >>>>> > * chunked materialization, both of which are >>>>> downstream from the >>>>> > * ORDER_BY operator. >>>>> > * >>>>> > * @see #1044 (PROJECTION after ORDER BY does >>>>> not preserve order) >>>>> > */ >>>>> > final boolean preserveOrder = orderBy != null; >>>>> > >>>>> > /* >>>>> > * Append operator to drop variables which are not >>>>> projected by >>>>> > the * subquery. >>>>> > * >>>>> > * Note: We need to retain all variables which were >>>>> visible in >>>>> > the * parent group plus anything which was projected out of the * >>>>> subquery. >>>>> > Since there can be exogenous variables, the easiest way * to do this >>>>> > correctly is to drop variables from the subquery plan * which are not >>>>> > projected by the subquery. (This is not done at the * top-level >>>>> query plan >>>>> > because it would cause exogenous variables * to be dropped.) >>>>> > */ >>>>> > >>>>> > { >>>>> > // The variables projected by the >>>>> subquery. >>>>> > final IVariable<?>[] projectedVars = >>>>> projection >>>>> > .getProjectionVars(); >>>>> > >>>>> > final List<NV> anns = new >>>>> LinkedList<NV>(); >>>>> > anns.add(new >>>>> NV(BOp.Annotations.BOP_ID, ctx.nextId())); >>>>> > anns.add(new >>>>> NV(BOp.Annotations.EVALUATION_CONTEXT, >>>>> > BOpEvaluationContext.CONTROLLER)); anns.add(new >>>>> > NV(PipelineOp.Annotations.SHARED_STATE, true));// live stats >>>>> anns.add(new >>>>> > NV(ProjectionOp.Annotations.SELECT, projectedVars)); if >>>>> (preserveOrder) { >>>>> > /** >>>>> > * @see #563 (ORDER BY + >>>>> DISTINCT) >>>>> > * @see #1044 (PROJECTION >>>>> after ORDER BY does not preserve >>>>> > * order) >>>>> > */ >>>>> > anns.add(new >>>>> NV(PipelineOp.Annotations.MAX_PARALLEL, 1)); >>>>> > anns.add(new >>>>> NV(SliceOp.Annotations.REORDER_SOLUTIONS, false)); >>>>> > } >>>>> > left = applyQueryHints(new >>>>> ProjectionOp(leftOrEmpty(left),// >>>>> > anns.toArray(new >>>>> NV[anns.size()])// >>>>> > ), queryBase, ctx); >>>>> > } >>>>> > </snip> >>>>> > >>>>> > If the preserve order flag is true, parallelism for the operator is >>>>> > explicitly disabled. Disabling parallelism for the projection node >>>>> would >>>>> > help for simple queries such as single triple pattern, but in the >>>>> general >>>>> > case (for more complex queries) there will be other operators that >>>>> might >>>>> > cause non-deterministic behaviour. >>>>> > >>>>> > @Olaf Hartig (CC) implemented a Linked Data Fragment interface on >>>>> top of >>>>> > Blazegraph, adding him in CC. >>>>> > >>>>> > >>>>> > Best, >>>>> > Michael >>>>> > >>>>> > > From: Blaise de Carné <bde...@gm... >>>>> <javascript:_e(%7B%7D,'cvml','bde...@gm...');>> >>>>> > > Subject: [Bigdata-commit] Pagination consistency without ORDER BY >>>>> > > Date: 8 April 2016 at 10:58:02 GMT+2 >>>>> > > To: "big...@li... >>>>> <javascript:_e(%7B%7D,'cvml','big...@li...');> >>>>> " >>>>> > > <big...@li... >>>>> <javascript:_e(%7B%7D,'cvml','big...@li...');> >>>>> > >>>>> > > >>>>> > > Hi there, >>>>> > > >>>>> > > I would like to expose a considiration that I find very annoying. >>>>> I need >>>>> > > to do more tests but i would like to know your fellings about it. >>>>> > > >>>>> > > Look for this exemple : >>>>> > > >>>>> > > construct where { >>>>> > > >>>>> > > ?s <http://geovocab.org/geometry#geometry >>>>> > > <http://geovocab.org/geometry#geometry>> ?event> >>>>> > > } limit 5 >>>>> > > >>>>> > > It take avout 100ms to execute on my 3B dataset. >>>>> > > >>>>> > > In 90% of time, this give me 5 results in the same order : >>>>> > > >>>>> > > <http://linkedgeodata.org/triplify/node1003406722> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1003406722>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1003 >>>>> > > 406722> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1003406722>> >>>>> > > <http://linkedgeodata.org/triplify/node1003749425> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1003749425>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1003 >>>>> > > 749425> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1003749425>> >>>>> > > <http://linkedgeodata.org/triplify/node1011261499> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1011261499>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1011 >>>>> > > 261499> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1011261499>> >>>>> > > <http://linkedgeodata.org/triplify/node1011261514> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1011261514>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1011 >>>>> > > 261514> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1011261514>> >>>>> > > <http://linkedgeodata.org/triplify/node1011286717> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1011286717>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1011 >>>>> > > 286717> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1011286717>> But sometime, i get >>>>> differents >>>>> > > results : >>>>> > > >>>>> > > <http://linkedgeodata.org/triplify/node1204787784> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1204787784>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1204 >>>>> > > 787784> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1204787784>> >>>>> > > <http://linkedgeodata.org/triplify/node1206798938> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1206798938>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1206 >>>>> > > 798938> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1206798938>> >>>>> > > <http://linkedgeodata.org/triplify/node12081506> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node12081506>> >>>>> <http://geovocab.org/geometry#geom >>>>> > > etry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1208 >>>>> > > 1506> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node12081506>> >>>>> > > <http://linkedgeodata.org/triplify/node1209197022> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1209197022>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1209 >>>>> > > 197022> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1209197022>> >>>>> > > <http://linkedgeodata.org/triplify/node1212230478> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/triplify/node1212230478>> < >>>>> http://geovocab.org/geometry#ge >>>>> > > ometry> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://ge >>>>> > > ovocab.org/geometry#geometry>> >>>>> <http://linkedgeodata.org/geometry/node1212 >>>>> > > 230478> >>>>> > > <http://ns3027589.ip-149-202-90.eu:9999/blazegraph/#explore:kb:< >>>>> http://li >>>>> > > nkedgeodata.org/geometry/node1212230478>> >>>>> > > >>>>> > > Conclusion : order is not garantee without ORDER BY. If i use an >>>>> ORDER BY, >>>>> > > performance drop alarmingly. >>>>> > > >>>>> > > Now take this fabulous project : Linked Data Fragments >>>>> > > (http://linkeddatafragments.org/ <http://linkeddatafragments.org/ >>>>> >), >>>>> > > which provide a SparqlDatasource to handle data from a SPARQL >>>>> Endpoint. >>>>> > > They use CONSTRUCT queries with LIMIT and OFFSET to paginate the >>>>> results, >>>>> > > as they says in the comments : >>>>> > > >>>>> > > // Even though the SPARQL spec indicates that >>>>> > > // LIMIT and OFFSET might be meaningless without ORDER BY, >>>>> > > // this doesn't seem a problem in practice. >>>>> > > // Furthermore, sorting can be slow. Therefore, don't sort. >>>>> > > >>>>> > > But it's a problem in practice with Blazegraph, and i >>>>> exeperimented it : a >>>>> > > Linked Data Fragments server configured over a Blazegraph SPARQL >>>>> Endpoint >>>>> > > serve different pages in 5-10% of time. >>>>> > > >>>>> > > In our project we really need to get consistent pagination, >>>>> without ORDER >>>>> > > BY. Do you think that is possible with Blazegraph ? >>>>> > > >>>>> > > Bests, >>>>> > > Blaise >>>>> > > >>>>> > > PS : i don't see this behaviour with SELECT, but cache could be >>>>> > > responsible... >>>>> >>>> >>> -- >>> Sent from my Android device with K-9 Mail. Please excuse my brevity. >>> >>> >>> ------------------------------------------------------------------------------ >>> Find and fix application performance issues faster with Applications >>> Manager >>> Applications Manager provides deep performance insights into multiple >>> tiers of >>> your business applications. It resolves application problems quickly and >>> reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/ >>> gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532 >>> <http://pubads.g.doubleclick.net/gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532> >>> _______________________________________________ >>> Bigdata-developers mailing list >>> Big...@li... >>> <javascript:_e(%7B%7D,'cvml','Big...@li...');> >>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>> >> -- ---- Bryan Thompson Chief Scientist & Founder Blazegraph e: br...@bl... w: http://blazegraph.com Blazegraph products help to solve the Graph Cache Thrash to achieve large scale processing for graph and predictive analytics. Blazegraph is the creator of the industry’s first GPU-accelerated high-performance database for large graphs, has been named as one of the “10 Companies and Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. Blazegraph Database <https://www.blazegraph.com/> is our ultra-high performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Blazegraph GPU <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new technologies that use GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. |
|
From: Stas M. <sma...@wi...> - 2016-04-11 18:04:31
|
Hi! > This is because the journal can not be opened from two separate > processes at the same time. I can discuss this with Olaf. It'd be nice if the LDF server could access the store from running server, since it'd allow to serve the same data set in both modes. If I understand correctly, LDF server is read-only, so it should not have requirements over what parallel access within Blazegraph instance would require? Or is it more complicated? Thanks, -- Stas Malyshev sma...@wi... |
|
From: Bryan T. <br...@sy...> - 2016-04-11 18:10:58
|
I think that this is just an integration question. it was released initially as a separate artifact. It probably needs to be bundled to make this work. Or "hooked" as a lazy integration component. I will discuss some possible approaches with Brad and Olaf. Thanks, Bryan ---- Bryan Thompson Chief Scientist & Founder Blazegraph e: br...@bl... w: http://blazegraph.com Blazegraph products help to solve the Graph Cache Thrash to achieve large scale processing for graph and predictive analytics. Blazegraph is the creator of the industry’s first GPU-accelerated high-performance database for large graphs, has been named as one of the “10 Companies and Technologies to Watch in 2016” <http://insideanalysis.com/2016/01/20535/>. Blazegraph Database <https://www.blazegraph.com/> is our ultra-high performance graph database that supports both RDF/SPARQL and Tinkerpop/Blueprints APIs. Blazegraph GPU <https://www.blazegraph.com/product/gpu-accelerated/> andBlazegraph DAS <https://www.blazegraph.com/product/gpu-accelerated/>L are disruptive new technologies that use GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. CONFIDENTIALITY NOTICE: This email and its contents and attachments are for the sole use of the intended recipient(s) and are confidential or proprietary to SYSTAP, LLC DBA Blazegraph. Any unauthorized review, use, disclosure, dissemination or copying of this email or its contents or attachments is prohibited. If you have received this communication in error, please notify the sender by reply email and permanently delete all copies of the email and its contents and attachments. On Mon, Apr 11, 2016 at 2:04 PM, Stas Malyshev <sma...@wi...> wrote: > Hi! > > > This is because the journal can not be opened from two separate > > processes at the same time. I can discuss this with Olaf. > > It'd be nice if the LDF server could access the store from running > server, since it'd allow to serve the same data set in both modes. If I > understand correctly, LDF server is read-only, so it should not have > requirements over what parallel access within Blazegraph instance would > require? Or is it more complicated? > > Thanks, > -- > Stas Malyshev > sma...@wi... > |