This list is closed, nobody may subscribe to it.
| 2010 |
Jan
|
Feb
(19) |
Mar
(8) |
Apr
(25) |
May
(16) |
Jun
(77) |
Jul
(131) |
Aug
(76) |
Sep
(30) |
Oct
(7) |
Nov
(3) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(16) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(7) |
| 2012 |
Jan
(10) |
Feb
(1) |
Mar
(8) |
Apr
(6) |
May
(1) |
Jun
(3) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(8) |
Dec
(2) |
| 2013 |
Jan
(5) |
Feb
(12) |
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(22) |
Aug
(50) |
Sep
(31) |
Oct
(64) |
Nov
(83) |
Dec
(28) |
| 2014 |
Jan
(31) |
Feb
(18) |
Mar
(27) |
Apr
(39) |
May
(45) |
Jun
(15) |
Jul
(6) |
Aug
(27) |
Sep
(6) |
Oct
(67) |
Nov
(70) |
Dec
(1) |
| 2015 |
Jan
(3) |
Feb
(18) |
Mar
(22) |
Apr
(121) |
May
(42) |
Jun
(17) |
Jul
(8) |
Aug
(11) |
Sep
(26) |
Oct
(15) |
Nov
(66) |
Dec
(38) |
| 2016 |
Jan
(14) |
Feb
(59) |
Mar
(28) |
Apr
(44) |
May
(21) |
Jun
(12) |
Jul
(9) |
Aug
(11) |
Sep
(4) |
Oct
(2) |
Nov
(1) |
Dec
|
| 2017 |
Jan
(20) |
Feb
(7) |
Mar
(4) |
Apr
(18) |
May
(7) |
Jun
(3) |
Jul
(13) |
Aug
(2) |
Sep
(4) |
Oct
(9) |
Nov
(2) |
Dec
(5) |
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Mike P. <mi...@sy...> - 2013-11-14 13:06:24
|
Seems interesting. Give me a call if you want to talk about it. From: Jeremy J Carroll <jj...@sy...<mailto:jj...@sy...>> Date: Wednesday, November 13, 2013 8:48 PM To: "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: Re: [Bigdata-developers] analysis of 770 and 773: cardinality of ?a p* ?b Here is a proposal for the values returned by ALPP getEstimatedCardinality where lowerBound() == 0 Calculate the result from the single child as with the code before my commit 7442 If lowerBound() == 0 and: - one end is bound then add 1 to the result - two ends are bound then add 1 to the result if the two ends are equal otherwise 0 - add a large number to the result if both ends are unbound, where the large number should ideally be the number of non-literal nodes in the context ( maybe using StatementPatternNode sp = alpp.get(0).get(0); final IV<?, ?> c = getIV(sp.c(), exogenousBindings); long card = db.getAccessPath(null, null, null, c, null).rangeCount(false); ) i.e. attempt to address the issues by improving the estimate of the cardinality in the relevant cases. I will think about how to make appropriate test cases … feels like using the optimizer test case pattern from com.bigdata.rdf.sparql.ast.optimizers.TestAll If this looks acceptable I can have a shot tomorrow ... Jeremy J Carroll Principal Architect Syapse, Inc. On Nov 13, 2013, at 5:30 PM, Jeremy J Carroll <jj...@sy...<mailto:jj...@sy...>> wrote: My commit 7442 introduced some problems while solving https://sourceforge.net/apps/trac/bigdata/ticket/739 My commit concerned zero length property paths, where the query in trac 739 was misbehaving because a zlpp needs to be run last … the actual estimate could be the number of items in the current graph context, but I put Long.MAX_VALUE (in commit 7442, which should be visible here: https://github.com/jeremycarroll/bigdata/commit/9f93a2b752bbfcee84f0e8c1047d9a17fcf6223f ) This had an unintended side effect of marking such ALPPs as not reorder able, because publicboolean isReorderable() { finallong estCard = getEstimatedCardinality(null); return estCard >= 0 && estCard < Long.MAX_VALUE; } On my machine I seem to be doing better on the examples from 770 773 and 739 using the (definitely hacky) public long getEstimatedCardinality(StaticOptimizer opt) { final JoinGroupNode group = subgroup(); /* * if lowerBound() is zero, and both ?s and ?o are * variables then we (notionally) match * any subject or object in the triple store, * see: * * http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#defn_evalPP_ZeroOrOnePath * * Despite this not being implemented, the optimizer does better * knowing this correctly. */ if (lowerBound() == 0 && left() instanceof VarNode && right() instanceof VarNode) { return Long.MAX_VALUE/2; } …. Jeremy J Carroll Principal Architect Syapse, Inc. |
|
From: Jeremy J C. <jj...@sy...> - 2013-11-14 01:49:07
|
Here is a proposal for the values returned by ALPP getEstimatedCardinality where lowerBound() == 0 Calculate the result from the single child as with the code before my commit 7442 If lowerBound() == 0 and: - one end is bound then add 1 to the result - two ends are bound then add 1 to the result if the two ends are equal otherwise 0 - add a large number to the result if both ends are unbound, where the large number should ideally be the number of non-literal nodes in the context ( maybe using StatementPatternNode sp = alpp.get(0).get(0); final IV<?, ?> c = getIV(sp.c(), exogenousBindings); long card = db.getAccessPath(null, null, null, c, null).rangeCount(false); ) i.e. attempt to address the issues by improving the estimate of the cardinality in the relevant cases. I will think about how to make appropriate test cases … feels like using the optimizer test case pattern from com.bigdata.rdf.sparql.ast.optimizers.TestAll If this looks acceptable I can have a shot tomorrow ... Jeremy J Carroll Principal Architect Syapse, Inc. On Nov 13, 2013, at 5:30 PM, Jeremy J Carroll <jj...@sy...> wrote: > > My commit 7442 introduced some problems while solving > > https://sourceforge.net/apps/trac/bigdata/ticket/739 > > > My commit concerned zero length property paths, where the query in trac 739 was misbehaving because a zlpp needs to be run last … the actual estimate could be the number of items in the current graph context, but I put Long.MAX_VALUE (in commit 7442, which should be visible here: > https://github.com/jeremycarroll/bigdata/commit/9f93a2b752bbfcee84f0e8c1047d9a17fcf6223f > ) > > This had an unintended side effect of marking such ALPPs as not reorder able, because > > > > public boolean isReorderable() { > > final long estCard = getEstimatedCardinality(null); > > return estCard >= 0 && estCard < Long.MAX_VALUE; > > } > > > On my machine I seem to be doing better on the examples from 770 773 and 739 using the (definitely hacky) > > public long getEstimatedCardinality(StaticOptimizer opt) { > > final JoinGroupNode group = subgroup(); > > /* > * if lowerBound() is zero, and both ?s and ?o are > * variables then we (notionally) match > * any subject or object in the triple store, > * see: > * > * http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#defn_evalPP_ZeroOrOnePath > * > * Despite this not being implemented, the optimizer does better > * knowing this correctly. > */ > if (lowerBound() == 0 && left() instanceof VarNode && right() instanceof VarNode) { > return Long.MAX_VALUE/2; > } > > > > …. > > Jeremy J Carroll > Principal Architect > Syapse, Inc. > > > |
|
From: Jeremy J C. <jj...@sy...> - 2013-11-14 01:30:41
|
My commit 7442 introduced some problems while solving https://sourceforge.net/apps/trac/bigdata/ticket/739 My commit concerned zero length property paths, where the query in trac 739 was misbehaving because a zlpp needs to be run last … the actual estimate could be the number of items in the current graph context, but I put Long.MAX_VALUE (in commit 7442, which should be visible here: https://github.com/jeremycarroll/bigdata/commit/9f93a2b752bbfcee84f0e8c1047d9a17fcf6223f ) This had an unintended side effect of marking such ALPPs as not reorder able, because public boolean isReorderable() { final long estCard = getEstimatedCardinality(null); return estCard >= 0 && estCard < Long.MAX_VALUE; } On my machine I seem to be doing better on the examples from 770 773 and 739 using the (definitely hacky) public long getEstimatedCardinality(StaticOptimizer opt) { final JoinGroupNode group = subgroup(); /* * if lowerBound() is zero, and both ?s and ?o are * variables then we (notionally) match * any subject or object in the triple store, * see: * * http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#defn_evalPP_ZeroOrOnePath * * Despite this not being implemented, the optimizer does better * knowing this correctly. */ if (lowerBound() == 0 && left() instanceof VarNode && right() instanceof VarNode) { return Long.MAX_VALUE/2; } …. Jeremy J Carroll Principal Architect Syapse, Inc. |
|
From: Jeremy J C. <jj...@sy...> - 2013-11-13 23:00:48
|
I added the following ticket: https://sourceforge.net/apps/trac/bigdata/ticket/773 I am now going through the various tickets and trying to build test cases etc. Hoping to spend at least the rest of the week on bigdata, hopefully some of next week too. Jeremy J Carroll Principal Architect Syapse, Inc. On Nov 12, 2013, at 8:06 AM, Mike Personick <mi...@sy...> wrote: > I suspect in the 1s run the subClassOf* operator is running only once with > ?c1 unbound, then is being hash joined against ?x rdf:type ?c1. In the 8s > run bindings for ?c1 are flowing into the rdfs:subClassOf* operator and > the fixed point has to be done more than once. Just my hunch. > > > > On 11/8/13 8:14 PM, "Jeremy J Carroll" <jj...@sy...> wrote: > >> I kept drilling Š >> >> the actual comparison was as follows: >> >>> ?x rdf:type / rdfs:subClassOf *?c >> >> 8 s >> >>> >>> ?x rdf:type ?c1 . >>> ?c1 rdfs:subClassOf * ?c . >> >> 8 s >> >>> >>> ?x rdf:type ?c1 . >>> { ?c1 rdfs:subClassOf * ?c . } >> >> 1s >> >> yes there was a lot else going on in a large query but that was what it >> came down to. >> replacing the { ?c1 rdfs:subClassOf * ?c . } >> with syapse:optimizedSubClassOf shaved a further 0.1 s off and didn't >> need the { } >> >> My attempt to used named solution sets failed miserably: I think new trac >> item 771 is the root cause of that failure >> >> >> Jeremy J Carroll >> Principal Architect >> Syapse, Inc. >> >> >> >> On Nov 8, 2013, at 4:12 PM, Jeremy J Carroll <jj...@sy...> wrote: >> >>> My data was incorrect >>> >>> The performance gain for memoization is much less than I had said. >>> >>> I think maybe I actual hit an issue to do with >>> >>> ?x rdf:type / rdfs:subClassOf *?c >>> >>> as opposed to >>> >>> ?x rdf:type ?c1 . >>> ?c1 rdfs:subClassOf * ?c . >>> >>> This would have to implicate the static optimization phase Š I will try >>> and get a clear test case >>> >>> On my experiments today, there is some gain with memoization but it >>> seems to be in the 10% area, and very hard to achieve using solution >>> sets, where the difficulties in variable renaming to have sub-components >>> that can be combined for more complex effects is just difficult >>> >>> >>> Jeremy J Carroll >>> Principal Architect >>> Syapse, Inc. >>> >>> >>> >>> On Nov 8, 2013, at 12:50 PM, Bryan Thompson <br...@sy...> wrote: >>> >>>> There are complexities (related to the MVCC semantics) for memoization >>>> with invalidation. For example, invalidation should not be applied to >>>> concurrent queries (nor to concurrent writers if using read/write tx) >>>> when >>>> an update would change the memoized result, e.g., for subClassOf. >>>> Truth >>>> maintenance does handle this correctly since the maintenance is doing >>>> within the same connection as the update - in fact, truth maintenance >>>> can >>>> be thought of as a pre-commit protocol. >>>> >>>> Truth maintenance does have some limits. First, there needs to be a >>>> single writer for truth maintenance, so it does not work with >>>> read/write >>>> tx updates (but read-only tx queries are fine). Second, truth >>>> maintenance >>>> does not work with quads. For quads, the issue is simply the pattern >>>> to >>>> be applied when combining existing assertions in the different named >>>> graphs and where to put those assertions. This has been discussed a >>>> bit >>>> in the past. One simple pattern is to draw from all graphs and put the >>>> assertions into some designated graph (or perhaps the "null" graph). >>>> Another pattern is to identify some relationship among the named graphs >>>> (e.g., graphs X Y and Z are ontologies that are used to compute >>>> inferences >>>> in the other graphs, which are "data"), or even to perform inference >>>> solely within a given named graph. >>>> >>>> I think that a narrow application of truth maintenance for specific >>>> closures combined with ALPPs might work quite well. >>>> >>>> Thanks, >>>> Bryan >>>> >>>> On 11/8/13 11:50 AM, "Jeremy J Carroll" <jj...@sy...> wrote: >>>> >>>>> >>>>> This message is highlighting a high-level issue to do with ALPPs >>>>> versus >>>>> materialized versions of the same query. >>>>> >>>>> yesterday I finished porting the final piece of the Syapse >>>>> application's >>>>> "normal user" functionality from our legacy knowledge base to bigdata. >>>>> This piece was the facetted browser - which has a heavy dependency on >>>>> some typing functionality, partial queries that I was writing as >>>>> >>>>> [A] ?object rdf:type / rdfs:subClassOf * ?class >>>>> >>>>> (this is a very small part of a big query that populates every cell >>>>> of a >>>>> facetted browse page) >>>>> >>>>> The performance of the initial cut was very significantly lower than >>>>> the >>>>> legacy system: I got a big boost by pulling in a recent change from >>>>> Mike; >>>>> but even so I was not in the right ball-park. >>>>> >>>>> On analysis the issue seemed to come down to the rdfs:subClassOf * >>>>> expressions, and I can meet my performance expectations by >>>>> materializing >>>>> the reflexive transitive closure of this property so that the query >>>>> becomes >>>>> >>>>> [B] ?object rdf:type / syapse:optimizedSubClassOf ?class >>>>> >>>>> (approx: I got a factor of 10 from Mike's changes and a further >>>>> factor of >>>>> maybe 5 from materializing) >>>>> >>>>> The architectural question is: >>>>> >>>>> - should the ALPP code actually do a materialization (which would >>>>> need to >>>>> be invalidated on update), probably controlled by an optimization >>>>> hint, >>>>> or by counting (e.g. if we call rdfs:subClassOf * sufficiently >>>>> frequently >>>>> compared with the updates then we should materialize) >>>>> >>>>> if it did, I imagine that the performance of the initial query [A] >>>>> could >>>>> approach that of the optimized query [B]. >>>>> >>>>> Arguments against (other than time and prioritization) are: >>>>> - this optimization is better done by the end user (as I am doing), >>>>> where >>>>> it can be guided by application knowledge (which is true for me - >>>>> syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *, >>>>> e.g. >>>>> it is only reflexive on classes, and only on those classes that I care >>>>> about in the sort of query I am supporting) >>>>> - the cache invalidation is also hard to get right in a general >>>>> setting, >>>>> whereas application level knowledge can make cache invalidation >>>>> trivial >>>>> (in the syapse application any change to the ontology is a pretty rare >>>>> admin function, and we can invalidate all ontological caches for every >>>>> change without any issue) >>>>> >>>>> Arguments for are - this is otherwise an improvement that is >>>>> conceptually >>>>> straightforward >>>>> >>>>> Jeremy J Carroll >>>>> Principal Architect >>>>> Syapse, Inc. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ----------------------------------------------------------------------- >>>>> --- >>>>> ---- >>>>> November Webinars for C, C++, Fortran Developers >>>>> Accelerate application performance with scalable programming models. >>>>> Explore >>>>> techniques for threading, error checking, porting, and tuning. Get the >>>>> most >>>>> from the latest Intel processors and coprocessors. See abstracts and >>>>> register >>>>> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.cl >>>>> ktr >>>>> k >>>>> _______________________________________________ >>>>> Bigdata-developers mailing list >>>>> Big...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers >>>> >>> >> >> >> -------------------------------------------------------------------------- >> ---- >> November Webinars for C, C++, Fortran Developers >> Accelerate application performance with scalable programming models. >> Explore >> techniques for threading, error checking, porting, and tuning. Get the >> most >> from the latest Intel processors and coprocessors. See abstracts and >> register >> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktr >> k >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
|
From: Bryan T. <br...@sy...> - 2013-11-12 17:46:27
|
The maven repository (http://www.systap.com/maven) is now back up and snapshots are now being published again. Thanks, Bryan |
|
From: Jeremy J C. <jj...@sy...> - 2013-11-12 16:37:38
|
sorry I am deep in pulling my system together - hope to surface soon, but probably not today
I feel that this issue is that the static optimizer is misordering ….
Jeremy J Carroll
Principal Architect
Syapse, Inc.
On Nov 12, 2013, at 8:06 AM, Mike Personick <mi...@sy...> wrote:
> I suspect in the 1s run the subClassOf* operator is running only once with
> ?c1 unbound, then is being hash joined against ?x rdf:type ?c1. In the 8s
> run bindings for ?c1 are flowing into the rdfs:subClassOf* operator and
> the fixed point has to be done more than once. Just my hunch.
>
>
>
> On 11/8/13 8:14 PM, "Jeremy J Carroll" <jj...@sy...> wrote:
>
>> I kept drilling Š
>>
>> the actual comparison was as follows:
>>
>>> ?x rdf:type / rdfs:subClassOf *?c
>>
>> 8 s
>>
>>>
>>> ?x rdf:type ?c1 .
>>> ?c1 rdfs:subClassOf * ?c .
>>
>> 8 s
>>
>>>
>>> ?x rdf:type ?c1 .
>>> { ?c1 rdfs:subClassOf * ?c . }
>>
>> 1s
>>
>> yes there was a lot else going on in a large query but that was what it
>> came down to.
>> replacing the { ?c1 rdfs:subClassOf * ?c . }
>> with syapse:optimizedSubClassOf shaved a further 0.1 s off and didn't
>> need the { }
>>
>> My attempt to used named solution sets failed miserably: I think new trac
>> item 771 is the root cause of that failure
>>
>>
>> Jeremy J Carroll
>> Principal Architect
>> Syapse, Inc.
>>
>>
>>
>> On Nov 8, 2013, at 4:12 PM, Jeremy J Carroll <jj...@sy...> wrote:
>>
>>> My data was incorrect
>>>
>>> The performance gain for memoization is much less than I had said.
>>>
>>> I think maybe I actual hit an issue to do with
>>>
>>> ?x rdf:type / rdfs:subClassOf *?c
>>>
>>> as opposed to
>>>
>>> ?x rdf:type ?c1 .
>>> ?c1 rdfs:subClassOf * ?c .
>>>
>>> This would have to implicate the static optimization phase Š I will try
>>> and get a clear test case
>>>
>>> On my experiments today, there is some gain with memoization but it
>>> seems to be in the 10% area, and very hard to achieve using solution
>>> sets, where the difficulties in variable renaming to have sub-components
>>> that can be combined for more complex effects is just difficult
>>>
>>>
>>> Jeremy J Carroll
>>> Principal Architect
>>> Syapse, Inc.
>>>
>>>
>>>
>>> On Nov 8, 2013, at 12:50 PM, Bryan Thompson <br...@sy...> wrote:
>>>
>>>> There are complexities (related to the MVCC semantics) for memoization
>>>> with invalidation. For example, invalidation should not be applied to
>>>> concurrent queries (nor to concurrent writers if using read/write tx)
>>>> when
>>>> an update would change the memoized result, e.g., for subClassOf.
>>>> Truth
>>>> maintenance does handle this correctly since the maintenance is doing
>>>> within the same connection as the update - in fact, truth maintenance
>>>> can
>>>> be thought of as a pre-commit protocol.
>>>>
>>>> Truth maintenance does have some limits. First, there needs to be a
>>>> single writer for truth maintenance, so it does not work with
>>>> read/write
>>>> tx updates (but read-only tx queries are fine). Second, truth
>>>> maintenance
>>>> does not work with quads. For quads, the issue is simply the pattern
>>>> to
>>>> be applied when combining existing assertions in the different named
>>>> graphs and where to put those assertions. This has been discussed a
>>>> bit
>>>> in the past. One simple pattern is to draw from all graphs and put the
>>>> assertions into some designated graph (or perhaps the "null" graph).
>>>> Another pattern is to identify some relationship among the named graphs
>>>> (e.g., graphs X Y and Z are ontologies that are used to compute
>>>> inferences
>>>> in the other graphs, which are "data"), or even to perform inference
>>>> solely within a given named graph.
>>>>
>>>> I think that a narrow application of truth maintenance for specific
>>>> closures combined with ALPPs might work quite well.
>>>>
>>>> Thanks,
>>>> Bryan
>>>>
>>>> On 11/8/13 11:50 AM, "Jeremy J Carroll" <jj...@sy...> wrote:
>>>>
>>>>>
>>>>> This message is highlighting a high-level issue to do with ALPPs
>>>>> versus
>>>>> materialized versions of the same query.
>>>>>
>>>>> yesterday I finished porting the final piece of the Syapse
>>>>> application's
>>>>> "normal user" functionality from our legacy knowledge base to bigdata.
>>>>> This piece was the facetted browser - which has a heavy dependency on
>>>>> some typing functionality, partial queries that I was writing as
>>>>>
>>>>> [A] ?object rdf:type / rdfs:subClassOf * ?class
>>>>>
>>>>> (this is a very small part of a big query that populates every cell
>>>>> of a
>>>>> facetted browse page)
>>>>>
>>>>> The performance of the initial cut was very significantly lower than
>>>>> the
>>>>> legacy system: I got a big boost by pulling in a recent change from
>>>>> Mike;
>>>>> but even so I was not in the right ball-park.
>>>>>
>>>>> On analysis the issue seemed to come down to the rdfs:subClassOf *
>>>>> expressions, and I can meet my performance expectations by
>>>>> materializing
>>>>> the reflexive transitive closure of this property so that the query
>>>>> becomes
>>>>>
>>>>> [B] ?object rdf:type / syapse:optimizedSubClassOf ?class
>>>>>
>>>>> (approx: I got a factor of 10 from Mike's changes and a further
>>>>> factor of
>>>>> maybe 5 from materializing)
>>>>>
>>>>> The architectural question is:
>>>>>
>>>>> - should the ALPP code actually do a materialization (which would
>>>>> need to
>>>>> be invalidated on update), probably controlled by an optimization
>>>>> hint,
>>>>> or by counting (e.g. if we call rdfs:subClassOf * sufficiently
>>>>> frequently
>>>>> compared with the updates then we should materialize)
>>>>>
>>>>> if it did, I imagine that the performance of the initial query [A]
>>>>> could
>>>>> approach that of the optimized query [B].
>>>>>
>>>>> Arguments against (other than time and prioritization) are:
>>>>> - this optimization is better done by the end user (as I am doing),
>>>>> where
>>>>> it can be guided by application knowledge (which is true for me -
>>>>> syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *,
>>>>> e.g.
>>>>> it is only reflexive on classes, and only on those classes that I care
>>>>> about in the sort of query I am supporting)
>>>>> - the cache invalidation is also hard to get right in a general
>>>>> setting,
>>>>> whereas application level knowledge can make cache invalidation
>>>>> trivial
>>>>> (in the syapse application any change to the ontology is a pretty rare
>>>>> admin function, and we can invalidate all ontological caches for every
>>>>> change without any issue)
>>>>>
>>>>> Arguments for are - this is otherwise an improvement that is
>>>>> conceptually
>>>>> straightforward
>>>>>
>>>>> Jeremy J Carroll
>>>>> Principal Architect
>>>>> Syapse, Inc.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----------------------------------------------------------------------
>>>>> ---
>>>>> ----
>>>>> November Webinars for C, C++, Fortran Developers
>>>>> Accelerate application performance with scalable programming models.
>>>>> Explore
>>>>> techniques for threading, error checking, porting, and tuning. Get the
>>>>> most
>>>>> from the latest Intel processors and coprocessors. See abstracts and
>>>>> register
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.cl
>>>>> ktr
>>>>> k
>>>>> _______________________________________________
>>>>> Bigdata-developers mailing list
>>>>> Big...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>>>>
>>>
>>
>>
>> --------------------------------------------------------------------------
>> ----
>> November Webinars for C, C++, Fortran Developers
>> Accelerate application performance with scalable programming models.
>> Explore
>> techniques for threading, error checking, porting, and tuning. Get the
>> most
>> from the latest Intel processors and coprocessors. See abstracts and
>> register
>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktr
>> k
>> _______________________________________________
>> Bigdata-developers mailing list
>> Big...@li...
>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>
|
|
From: Mike P. <mi...@sy...> - 2013-11-12 16:07:12
|
I suspect in the 1s run the subClassOf* operator is running only once with
?c1 unbound, then is being hash joined against ?x rdf:type ?c1. In the 8s
run bindings for ?c1 are flowing into the rdfs:subClassOf* operator and
the fixed point has to be done more than once. Just my hunch.
On 11/8/13 8:14 PM, "Jeremy J Carroll" <jj...@sy...> wrote:
>I kept drilling Š
>
>the actual comparison was as follows:
>
>> ?x rdf:type / rdfs:subClassOf *?c
>
>8 s
>
>>
>> ?x rdf:type ?c1 .
>> ?c1 rdfs:subClassOf * ?c .
>
>8 s
>
>>
>> ?x rdf:type ?c1 .
>> { ?c1 rdfs:subClassOf * ?c . }
>
>1s
>
>yes there was a lot else going on in a large query but that was what it
>came down to.
>replacing the { ?c1 rdfs:subClassOf * ?c . }
>with syapse:optimizedSubClassOf shaved a further 0.1 s off and didn't
>need the { }
>
>My attempt to used named solution sets failed miserably: I think new trac
>item 771 is the root cause of that failure
>
>
>Jeremy J Carroll
>Principal Architect
>Syapse, Inc.
>
>
>
>On Nov 8, 2013, at 4:12 PM, Jeremy J Carroll <jj...@sy...> wrote:
>
>> My data was incorrect
>>
>> The performance gain for memoization is much less than I had said.
>>
>> I think maybe I actual hit an issue to do with
>>
>> ?x rdf:type / rdfs:subClassOf *?c
>>
>> as opposed to
>>
>> ?x rdf:type ?c1 .
>> ?c1 rdfs:subClassOf * ?c .
>>
>> This would have to implicate the static optimization phase Š I will try
>>and get a clear test case
>>
>> On my experiments today, there is some gain with memoization but it
>>seems to be in the 10% area, and very hard to achieve using solution
>>sets, where the difficulties in variable renaming to have sub-components
>>that can be combined for more complex effects is just difficult
>>
>>
>> Jeremy J Carroll
>> Principal Architect
>> Syapse, Inc.
>>
>>
>>
>> On Nov 8, 2013, at 12:50 PM, Bryan Thompson <br...@sy...> wrote:
>>
>>> There are complexities (related to the MVCC semantics) for memoization
>>> with invalidation. For example, invalidation should not be applied to
>>> concurrent queries (nor to concurrent writers if using read/write tx)
>>>when
>>> an update would change the memoized result, e.g., for subClassOf.
>>>Truth
>>> maintenance does handle this correctly since the maintenance is doing
>>> within the same connection as the update - in fact, truth maintenance
>>>can
>>> be thought of as a pre-commit protocol.
>>>
>>> Truth maintenance does have some limits. First, there needs to be a
>>> single writer for truth maintenance, so it does not work with
>>>read/write
>>> tx updates (but read-only tx queries are fine). Second, truth
>>>maintenance
>>> does not work with quads. For quads, the issue is simply the pattern
>>>to
>>> be applied when combining existing assertions in the different named
>>> graphs and where to put those assertions. This has been discussed a
>>>bit
>>> in the past. One simple pattern is to draw from all graphs and put the
>>> assertions into some designated graph (or perhaps the "null" graph).
>>> Another pattern is to identify some relationship among the named graphs
>>> (e.g., graphs X Y and Z are ontologies that are used to compute
>>>inferences
>>> in the other graphs, which are "data"), or even to perform inference
>>> solely within a given named graph.
>>>
>>> I think that a narrow application of truth maintenance for specific
>>> closures combined with ALPPs might work quite well.
>>>
>>> Thanks,
>>> Bryan
>>>
>>> On 11/8/13 11:50 AM, "Jeremy J Carroll" <jj...@sy...> wrote:
>>>
>>>>
>>>> This message is highlighting a high-level issue to do with ALPPs
>>>>versus
>>>> materialized versions of the same query.
>>>>
>>>> yesterday I finished porting the final piece of the Syapse
>>>>application's
>>>> "normal user" functionality from our legacy knowledge base to bigdata.
>>>> This piece was the facetted browser - which has a heavy dependency on
>>>> some typing functionality, partial queries that I was writing as
>>>>
>>>> [A] ?object rdf:type / rdfs:subClassOf * ?class
>>>>
>>>> (this is a very small part of a big query that populates every cell
>>>>of a
>>>> facetted browse page)
>>>>
>>>> The performance of the initial cut was very significantly lower than
>>>>the
>>>> legacy system: I got a big boost by pulling in a recent change from
>>>>Mike;
>>>> but even so I was not in the right ball-park.
>>>>
>>>> On analysis the issue seemed to come down to the rdfs:subClassOf *
>>>> expressions, and I can meet my performance expectations by
>>>>materializing
>>>> the reflexive transitive closure of this property so that the query
>>>> becomes
>>>>
>>>> [B] ?object rdf:type / syapse:optimizedSubClassOf ?class
>>>>
>>>> (approx: I got a factor of 10 from Mike's changes and a further
>>>>factor of
>>>> maybe 5 from materializing)
>>>>
>>>> The architectural question is:
>>>>
>>>> - should the ALPP code actually do a materialization (which would
>>>>need to
>>>> be invalidated on update), probably controlled by an optimization
>>>>hint,
>>>> or by counting (e.g. if we call rdfs:subClassOf * sufficiently
>>>>frequently
>>>> compared with the updates then we should materialize)
>>>>
>>>> if it did, I imagine that the performance of the initial query [A]
>>>>could
>>>> approach that of the optimized query [B].
>>>>
>>>> Arguments against (other than time and prioritization) are:
>>>> - this optimization is better done by the end user (as I am doing),
>>>>where
>>>> it can be guided by application knowledge (which is true for me -
>>>> syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *,
>>>>e.g.
>>>> it is only reflexive on classes, and only on those classes that I care
>>>> about in the sort of query I am supporting)
>>>> - the cache invalidation is also hard to get right in a general
>>>>setting,
>>>> whereas application level knowledge can make cache invalidation
>>>>trivial
>>>> (in the syapse application any change to the ontology is a pretty rare
>>>> admin function, and we can invalidate all ontological caches for every
>>>> change without any issue)
>>>>
>>>> Arguments for are - this is otherwise an improvement that is
>>>>conceptually
>>>> straightforward
>>>>
>>>> Jeremy J Carroll
>>>> Principal Architect
>>>> Syapse, Inc.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>-----------------------------------------------------------------------
>>>>---
>>>> ----
>>>> November Webinars for C, C++, Fortran Developers
>>>> Accelerate application performance with scalable programming models.
>>>> Explore
>>>> techniques for threading, error checking, porting, and tuning. Get the
>>>> most
>>>> from the latest Intel processors and coprocessors. See abstracts and
>>>> register
>>>>
>>>>http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.cl
>>>>ktr
>>>> k
>>>> _______________________________________________
>>>> Bigdata-developers mailing list
>>>> Big...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>>>
>>
>
>
>--------------------------------------------------------------------------
>----
>November Webinars for C, C++, Fortran Developers
>Accelerate application performance with scalable programming models.
>Explore
>techniques for threading, error checking, porting, and tuning. Get the
>most
>from the latest Intel processors and coprocessors. See abstracts and
>register
>http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktr
>k
>_______________________________________________
>Bigdata-developers mailing list
>Big...@li...
>https://lists.sourceforge.net/lists/listinfo/bigdata-developers
|
|
From: Mike P. <mi...@sy...> - 2013-11-11 12:46:33
|
Can you send me the logging output from ArbitraryLengthPathOp for the 8s
vs 1s cases with rdfs:subClassOf?
log4j.logger.com.bigdata.bop.paths.ArbitraryLengthPathOp=ALL
On 11/8/13 8:14 PM, "Jeremy J Carroll" <jj...@sy...> wrote:
>I kept drilling Š
>
>the actual comparison was as follows:
>
>> ?x rdf:type / rdfs:subClassOf *?c
>
>8 s
>
>>
>> ?x rdf:type ?c1 .
>> ?c1 rdfs:subClassOf * ?c .
>
>8 s
>
>>
>> ?x rdf:type ?c1 .
>> { ?c1 rdfs:subClassOf * ?c . }
>
>1s
>
>yes there was a lot else going on in a large query but that was what it
>came down to.
>replacing the { ?c1 rdfs:subClassOf * ?c . }
>with syapse:optimizedSubClassOf shaved a further 0.1 s off and didn't
>need the { }
>
>My attempt to used named solution sets failed miserably: I think new trac
>item 771 is the root cause of that failure
>
>
>Jeremy J Carroll
>Principal Architect
>Syapse, Inc.
>
>
>
>On Nov 8, 2013, at 4:12 PM, Jeremy J Carroll <jj...@sy...> wrote:
>
>> My data was incorrect
>>
>> The performance gain for memoization is much less than I had said.
>>
>> I think maybe I actual hit an issue to do with
>>
>> ?x rdf:type / rdfs:subClassOf *?c
>>
>> as opposed to
>>
>> ?x rdf:type ?c1 .
>> ?c1 rdfs:subClassOf * ?c .
>>
>> This would have to implicate the static optimization phase Š I will try
>>and get a clear test case
>>
>> On my experiments today, there is some gain with memoization but it
>>seems to be in the 10% area, and very hard to achieve using solution
>>sets, where the difficulties in variable renaming to have sub-components
>>that can be combined for more complex effects is just difficult
>>
>>
>> Jeremy J Carroll
>> Principal Architect
>> Syapse, Inc.
>>
>>
>>
>> On Nov 8, 2013, at 12:50 PM, Bryan Thompson <br...@sy...> wrote:
>>
>>> There are complexities (related to the MVCC semantics) for memoization
>>> with invalidation. For example, invalidation should not be applied to
>>> concurrent queries (nor to concurrent writers if using read/write tx)
>>>when
>>> an update would change the memoized result, e.g., for subClassOf.
>>>Truth
>>> maintenance does handle this correctly since the maintenance is doing
>>> within the same connection as the update - in fact, truth maintenance
>>>can
>>> be thought of as a pre-commit protocol.
>>>
>>> Truth maintenance does have some limits. First, there needs to be a
>>> single writer for truth maintenance, so it does not work with
>>>read/write
>>> tx updates (but read-only tx queries are fine). Second, truth
>>>maintenance
>>> does not work with quads. For quads, the issue is simply the pattern
>>>to
>>> be applied when combining existing assertions in the different named
>>> graphs and where to put those assertions. This has been discussed a
>>>bit
>>> in the past. One simple pattern is to draw from all graphs and put the
>>> assertions into some designated graph (or perhaps the "null" graph).
>>> Another pattern is to identify some relationship among the named graphs
>>> (e.g., graphs X Y and Z are ontologies that are used to compute
>>>inferences
>>> in the other graphs, which are "data"), or even to perform inference
>>> solely within a given named graph.
>>>
>>> I think that a narrow application of truth maintenance for specific
>>> closures combined with ALPPs might work quite well.
>>>
>>> Thanks,
>>> Bryan
>>>
>>> On 11/8/13 11:50 AM, "Jeremy J Carroll" <jj...@sy...> wrote:
>>>
>>>>
>>>> This message is highlighting a high-level issue to do with ALPPs
>>>>versus
>>>> materialized versions of the same query.
>>>>
>>>> yesterday I finished porting the final piece of the Syapse
>>>>application's
>>>> "normal user" functionality from our legacy knowledge base to bigdata.
>>>> This piece was the facetted browser - which has a heavy dependency on
>>>> some typing functionality, partial queries that I was writing as
>>>>
>>>> [A] ?object rdf:type / rdfs:subClassOf * ?class
>>>>
>>>> (this is a very small part of a big query that populates every cell
>>>>of a
>>>> facetted browse page)
>>>>
>>>> The performance of the initial cut was very significantly lower than
>>>>the
>>>> legacy system: I got a big boost by pulling in a recent change from
>>>>Mike;
>>>> but even so I was not in the right ball-park.
>>>>
>>>> On analysis the issue seemed to come down to the rdfs:subClassOf *
>>>> expressions, and I can meet my performance expectations by
>>>>materializing
>>>> the reflexive transitive closure of this property so that the query
>>>> becomes
>>>>
>>>> [B] ?object rdf:type / syapse:optimizedSubClassOf ?class
>>>>
>>>> (approx: I got a factor of 10 from Mike's changes and a further
>>>>factor of
>>>> maybe 5 from materializing)
>>>>
>>>> The architectural question is:
>>>>
>>>> - should the ALPP code actually do a materialization (which would
>>>>need to
>>>> be invalidated on update), probably controlled by an optimization
>>>>hint,
>>>> or by counting (e.g. if we call rdfs:subClassOf * sufficiently
>>>>frequently
>>>> compared with the updates then we should materialize)
>>>>
>>>> if it did, I imagine that the performance of the initial query [A]
>>>>could
>>>> approach that of the optimized query [B].
>>>>
>>>> Arguments against (other than time and prioritization) are:
>>>> - this optimization is better done by the end user (as I am doing),
>>>>where
>>>> it can be guided by application knowledge (which is true for me -
>>>> syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *,
>>>>e.g.
>>>> it is only reflexive on classes, and only on those classes that I care
>>>> about in the sort of query I am supporting)
>>>> - the cache invalidation is also hard to get right in a general
>>>>setting,
>>>> whereas application level knowledge can make cache invalidation
>>>>trivial
>>>> (in the syapse application any change to the ontology is a pretty rare
>>>> admin function, and we can invalidate all ontological caches for every
>>>> change without any issue)
>>>>
>>>> Arguments for are - this is otherwise an improvement that is
>>>>conceptually
>>>> straightforward
>>>>
>>>> Jeremy J Carroll
>>>> Principal Architect
>>>> Syapse, Inc.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>-----------------------------------------------------------------------
>>>>---
>>>> ----
>>>> November Webinars for C, C++, Fortran Developers
>>>> Accelerate application performance with scalable programming models.
>>>> Explore
>>>> techniques for threading, error checking, porting, and tuning. Get the
>>>> most
>>>> from the latest Intel processors and coprocessors. See abstracts and
>>>> register
>>>>
>>>>http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.cl
>>>>ktr
>>>> k
>>>> _______________________________________________
>>>> Bigdata-developers mailing list
>>>> Big...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>>>
>>
>
>
>--------------------------------------------------------------------------
>----
>November Webinars for C, C++, Fortran Developers
>Accelerate application performance with scalable programming models.
>Explore
>techniques for threading, error checking, porting, and tuning. Get the
>most
>from the latest Intel processors and coprocessors. See abstracts and
>register
>http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktr
>k
>_______________________________________________
>Bigdata-developers mailing list
>Big...@li...
>https://lists.sourceforge.net/lists/listinfo/bigdata-developers
|
|
From: Jeremy J C. <jj...@sy...> - 2013-11-09 01:14:23
|
I kept drilling …
the actual comparison was as follows:
> ?x rdf:type / rdfs:subClassOf *?c
8 s
>
> ?x rdf:type ?c1 .
> ?c1 rdfs:subClassOf * ?c .
8 s
>
> ?x rdf:type ?c1 .
> { ?c1 rdfs:subClassOf * ?c . }
1s
yes there was a lot else going on in a large query but that was what it came down to.
replacing the { ?c1 rdfs:subClassOf * ?c . }
with syapse:optimizedSubClassOf shaved a further 0.1 s off and didn't need the { }
My attempt to used named solution sets failed miserably: I think new trac item 771 is the root cause of that failure
Jeremy J Carroll
Principal Architect
Syapse, Inc.
On Nov 8, 2013, at 4:12 PM, Jeremy J Carroll <jj...@sy...> wrote:
> My data was incorrect
>
> The performance gain for memoization is much less than I had said.
>
> I think maybe I actual hit an issue to do with
>
> ?x rdf:type / rdfs:subClassOf *?c
>
> as opposed to
>
> ?x rdf:type ?c1 .
> ?c1 rdfs:subClassOf * ?c .
>
> This would have to implicate the static optimization phase … I will try and get a clear test case
>
> On my experiments today, there is some gain with memoization but it seems to be in the 10% area, and very hard to achieve using solution sets, where the difficulties in variable renaming to have sub-components that can be combined for more complex effects is just difficult
>
>
> Jeremy J Carroll
> Principal Architect
> Syapse, Inc.
>
>
>
> On Nov 8, 2013, at 12:50 PM, Bryan Thompson <br...@sy...> wrote:
>
>> There are complexities (related to the MVCC semantics) for memoization
>> with invalidation. For example, invalidation should not be applied to
>> concurrent queries (nor to concurrent writers if using read/write tx) when
>> an update would change the memoized result, e.g., for subClassOf. Truth
>> maintenance does handle this correctly since the maintenance is doing
>> within the same connection as the update - in fact, truth maintenance can
>> be thought of as a pre-commit protocol.
>>
>> Truth maintenance does have some limits. First, there needs to be a
>> single writer for truth maintenance, so it does not work with read/write
>> tx updates (but read-only tx queries are fine). Second, truth maintenance
>> does not work with quads. For quads, the issue is simply the pattern to
>> be applied when combining existing assertions in the different named
>> graphs and where to put those assertions. This has been discussed a bit
>> in the past. One simple pattern is to draw from all graphs and put the
>> assertions into some designated graph (or perhaps the "null" graph).
>> Another pattern is to identify some relationship among the named graphs
>> (e.g., graphs X Y and Z are ontologies that are used to compute inferences
>> in the other graphs, which are "data"), or even to perform inference
>> solely within a given named graph.
>>
>> I think that a narrow application of truth maintenance for specific
>> closures combined with ALPPs might work quite well.
>>
>> Thanks,
>> Bryan
>>
>> On 11/8/13 11:50 AM, "Jeremy J Carroll" <jj...@sy...> wrote:
>>
>>>
>>> This message is highlighting a high-level issue to do with ALPPs versus
>>> materialized versions of the same query.
>>>
>>> yesterday I finished porting the final piece of the Syapse application's
>>> "normal user" functionality from our legacy knowledge base to bigdata.
>>> This piece was the facetted browser - which has a heavy dependency on
>>> some typing functionality, partial queries that I was writing as
>>>
>>> [A] ?object rdf:type / rdfs:subClassOf * ?class
>>>
>>> (this is a very small part of a big query that populates every cell of a
>>> facetted browse page)
>>>
>>> The performance of the initial cut was very significantly lower than the
>>> legacy system: I got a big boost by pulling in a recent change from Mike;
>>> but even so I was not in the right ball-park.
>>>
>>> On analysis the issue seemed to come down to the rdfs:subClassOf *
>>> expressions, and I can meet my performance expectations by materializing
>>> the reflexive transitive closure of this property so that the query
>>> becomes
>>>
>>> [B] ?object rdf:type / syapse:optimizedSubClassOf ?class
>>>
>>> (approx: I got a factor of 10 from Mike's changes and a further factor of
>>> maybe 5 from materializing)
>>>
>>> The architectural question is:
>>>
>>> - should the ALPP code actually do a materialization (which would need to
>>> be invalidated on update), probably controlled by an optimization hint,
>>> or by counting (e.g. if we call rdfs:subClassOf * sufficiently frequently
>>> compared with the updates then we should materialize)
>>>
>>> if it did, I imagine that the performance of the initial query [A] could
>>> approach that of the optimized query [B].
>>>
>>> Arguments against (other than time and prioritization) are:
>>> - this optimization is better done by the end user (as I am doing), where
>>> it can be guided by application knowledge (which is true for me -
>>> syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *, e.g.
>>> it is only reflexive on classes, and only on those classes that I care
>>> about in the sort of query I am supporting)
>>> - the cache invalidation is also hard to get right in a general setting,
>>> whereas application level knowledge can make cache invalidation trivial
>>> (in the syapse application any change to the ontology is a pretty rare
>>> admin function, and we can invalidate all ontological caches for every
>>> change without any issue)
>>>
>>> Arguments for are - this is otherwise an improvement that is conceptually
>>> straightforward
>>>
>>> Jeremy J Carroll
>>> Principal Architect
>>> Syapse, Inc.
>>>
>>>
>>>
>>>
>>> --------------------------------------------------------------------------
>>> ----
>>> November Webinars for C, C++, Fortran Developers
>>> Accelerate application performance with scalable programming models.
>>> Explore
>>> techniques for threading, error checking, porting, and tuning. Get the
>>> most
>>> from the latest Intel processors and coprocessors. See abstracts and
>>> register
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktr
>>> k
>>> _______________________________________________
>>> Bigdata-developers mailing list
>>> Big...@li...
>>> https://lists.sourceforge.net/lists/listinfo/bigdata-developers
>>
>
|
|
From: Jeremy J C. <jj...@sy...> - 2013-11-09 00:12:48
|
My data was incorrect The performance gain for memoization is much less than I had said. I think maybe I actual hit an issue to do with ?x rdf:type / rdfs:subClassOf *?c as opposed to ?x rdf:type ?c1 . ?c1 rdfs:subClassOf * ?c . This would have to implicate the static optimization phase … I will try and get a clear test case On my experiments today, there is some gain with memoization but it seems to be in the 10% area, and very hard to achieve using solution sets, where the difficulties in variable renaming to have sub-components that can be combined for more complex effects is just difficult Jeremy J Carroll Principal Architect Syapse, Inc. On Nov 8, 2013, at 12:50 PM, Bryan Thompson <br...@sy...> wrote: > There are complexities (related to the MVCC semantics) for memoization > with invalidation. For example, invalidation should not be applied to > concurrent queries (nor to concurrent writers if using read/write tx) when > an update would change the memoized result, e.g., for subClassOf. Truth > maintenance does handle this correctly since the maintenance is doing > within the same connection as the update - in fact, truth maintenance can > be thought of as a pre-commit protocol. > > Truth maintenance does have some limits. First, there needs to be a > single writer for truth maintenance, so it does not work with read/write > tx updates (but read-only tx queries are fine). Second, truth maintenance > does not work with quads. For quads, the issue is simply the pattern to > be applied when combining existing assertions in the different named > graphs and where to put those assertions. This has been discussed a bit > in the past. One simple pattern is to draw from all graphs and put the > assertions into some designated graph (or perhaps the "null" graph). > Another pattern is to identify some relationship among the named graphs > (e.g., graphs X Y and Z are ontologies that are used to compute inferences > in the other graphs, which are "data"), or even to perform inference > solely within a given named graph. > > I think that a narrow application of truth maintenance for specific > closures combined with ALPPs might work quite well. > > Thanks, > Bryan > > On 11/8/13 11:50 AM, "Jeremy J Carroll" <jj...@sy...> wrote: > >> >> This message is highlighting a high-level issue to do with ALPPs versus >> materialized versions of the same query. >> >> yesterday I finished porting the final piece of the Syapse application's >> "normal user" functionality from our legacy knowledge base to bigdata. >> This piece was the facetted browser - which has a heavy dependency on >> some typing functionality, partial queries that I was writing as >> >> [A] ?object rdf:type / rdfs:subClassOf * ?class >> >> (this is a very small part of a big query that populates every cell of a >> facetted browse page) >> >> The performance of the initial cut was very significantly lower than the >> legacy system: I got a big boost by pulling in a recent change from Mike; >> but even so I was not in the right ball-park. >> >> On analysis the issue seemed to come down to the rdfs:subClassOf * >> expressions, and I can meet my performance expectations by materializing >> the reflexive transitive closure of this property so that the query >> becomes >> >> [B] ?object rdf:type / syapse:optimizedSubClassOf ?class >> >> (approx: I got a factor of 10 from Mike's changes and a further factor of >> maybe 5 from materializing) >> >> The architectural question is: >> >> - should the ALPP code actually do a materialization (which would need to >> be invalidated on update), probably controlled by an optimization hint, >> or by counting (e.g. if we call rdfs:subClassOf * sufficiently frequently >> compared with the updates then we should materialize) >> >> if it did, I imagine that the performance of the initial query [A] could >> approach that of the optimized query [B]. >> >> Arguments against (other than time and prioritization) are: >> - this optimization is better done by the end user (as I am doing), where >> it can be guided by application knowledge (which is true for me - >> syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *, e.g. >> it is only reflexive on classes, and only on those classes that I care >> about in the sort of query I am supporting) >> - the cache invalidation is also hard to get right in a general setting, >> whereas application level knowledge can make cache invalidation trivial >> (in the syapse application any change to the ontology is a pretty rare >> admin function, and we can invalidate all ontological caches for every >> change without any issue) >> >> Arguments for are - this is otherwise an improvement that is conceptually >> straightforward >> >> Jeremy J Carroll >> Principal Architect >> Syapse, Inc. >> >> >> >> >> -------------------------------------------------------------------------- >> ---- >> November Webinars for C, C++, Fortran Developers >> Accelerate application performance with scalable programming models. >> Explore >> techniques for threading, error checking, porting, and tuning. Get the >> most >> from the latest Intel processors and coprocessors. See abstracts and >> register >> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktr >> k >> _______________________________________________ >> Bigdata-developers mailing list >> Big...@li... >> https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
|
From: Bryan T. <br...@sy...> - 2013-11-08 20:51:29
|
There are complexities (related to the MVCC semantics) for memoization with invalidation. For example, invalidation should not be applied to concurrent queries (nor to concurrent writers if using read/write tx) when an update would change the memoized result, e.g., for subClassOf. Truth maintenance does handle this correctly since the maintenance is doing within the same connection as the update - in fact, truth maintenance can be thought of as a pre-commit protocol. Truth maintenance does have some limits. First, there needs to be a single writer for truth maintenance, so it does not work with read/write tx updates (but read-only tx queries are fine). Second, truth maintenance does not work with quads. For quads, the issue is simply the pattern to be applied when combining existing assertions in the different named graphs and where to put those assertions. This has been discussed a bit in the past. One simple pattern is to draw from all graphs and put the assertions into some designated graph (or perhaps the "null" graph). Another pattern is to identify some relationship among the named graphs (e.g., graphs X Y and Z are ontologies that are used to compute inferences in the other graphs, which are "data"), or even to perform inference solely within a given named graph. I think that a narrow application of truth maintenance for specific closures combined with ALPPs might work quite well. Thanks, Bryan On 11/8/13 11:50 AM, "Jeremy J Carroll" <jj...@sy...> wrote: > >This message is highlighting a high-level issue to do with ALPPs versus >materialized versions of the same query. > >yesterday I finished porting the final piece of the Syapse application's >"normal user" functionality from our legacy knowledge base to bigdata. >This piece was the facetted browser - which has a heavy dependency on >some typing functionality, partial queries that I was writing as > >[A] ?object rdf:type / rdfs:subClassOf * ?class > >(this is a very small part of a big query that populates every cell of a >facetted browse page) > >The performance of the initial cut was very significantly lower than the >legacy system: I got a big boost by pulling in a recent change from Mike; >but even so I was not in the right ball-park. > >On analysis the issue seemed to come down to the rdfs:subClassOf * >expressions, and I can meet my performance expectations by materializing >the reflexive transitive closure of this property so that the query >becomes > >[B] ?object rdf:type / syapse:optimizedSubClassOf ?class > >(approx: I got a factor of 10 from Mike's changes and a further factor of >maybe 5 from materializing) > >The architectural question is: > >- should the ALPP code actually do a materialization (which would need to >be invalidated on update), probably controlled by an optimization hint, >or by counting (e.g. if we call rdfs:subClassOf * sufficiently frequently >compared with the updates then we should materialize) > >if it did, I imagine that the performance of the initial query [A] could >approach that of the optimized query [B]. > >Arguments against (other than time and prioritization) are: >- this optimization is better done by the end user (as I am doing), where >it can be guided by application knowledge (which is true for me - >syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *, e.g. >it is only reflexive on classes, and only on those classes that I care >about in the sort of query I am supporting) >- the cache invalidation is also hard to get right in a general setting, >whereas application level knowledge can make cache invalidation trivial >(in the syapse application any change to the ontology is a pretty rare >admin function, and we can invalidate all ontological caches for every >change without any issue) > >Arguments for are - this is otherwise an improvement that is conceptually >straightforward > >Jeremy J Carroll >Principal Architect >Syapse, Inc. > > > > >-------------------------------------------------------------------------- >---- >November Webinars for C, C++, Fortran Developers >Accelerate application performance with scalable programming models. >Explore >techniques for threading, error checking, porting, and tuning. Get the >most >from the latest Intel processors and coprocessors. See abstracts and >register >http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktr >k >_______________________________________________ >Bigdata-developers mailing list >Big...@li... >https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
|
From: Bryan T. <br...@sy...> - 2013-11-08 18:55:14
|
Yes, that is a not yet implemented concept (the ISPO[] parameters to allow you to control the life cycle and backing data structure for the named solution sets). Bryan From: Jeremy Carroll <jj...@sy...<mailto:jj...@sy...>> Date: Friday, November 8, 2013 1:49 PM To: "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: [Bigdata-developers] solution sets - transient ?? Hi I am looking at both the code and documentation for solution sets, (concerning my subClassOf *) issue, and there seems to be some discrepancy. Specifically: https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=SPARQL_Update#CREATE_SOLUTIONS documents the existence of query hints "control over the how the named solution set is provisioned, including whether it is transient or persistent, the life cycle of the named solution set, etc." whereas com.bigdata.rdf.sparql.ast.ssets.SolutionSetManager._create(String, ISPO[]) says /** * Create iff it does not exist. * * @param solutionSet * The name. * @param params * The configuration parameters. * @return A solution set with NOTHING written on it. * * TODO ISPO[] params is ignored (you can not configure for a BTree * or HTree index for the solutions with a specified set of join * variables for the index). */ private SolutionSetStream _create(final String fqn, final ISPO[] params) I take it that the TODO comment and the code are correct ?? (Bryan, if you grant me write access on the Wiki, I am happy to correct the documentation to indicate this as a possible future extension) Jeremy J Carroll Principal Architect Syapse, Inc. |
|
From: Jeremy J C. <jj...@sy...> - 2013-11-08 18:49:34
|
Hi I am looking at both the code and documentation for solution sets, (concerning my subClassOf *) issue, and there seems to be some discrepancy. Specifically: https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=SPARQL_Update#CREATE_SOLUTIONS documents the existence of query hints "control over the how the named solution set is provisioned, including whether it is transient or persistent, the life cycle of the named solution set, etc." whereas com.bigdata.rdf.sparql.ast.ssets.SolutionSetManager._create(String, ISPO[]) says /** * Create iff it does not exist. * * @param solutionSet * The name. * @param params * The configuration parameters. * @return A solution set with NOTHING written on it. * * TODO ISPO[] params is ignored (you can not configure for a BTree * or HTree index for the solutions with a specified set of join * variables for the index). */ private SolutionSetStream _create(final String fqn, final ISPO[] params) I take it that the TODO comment and the code are correct ?? (Bryan, if you grant me write access on the Wiki, I am happy to correct the documentation to indicate this as a possible future extension) Jeremy J Carroll Principal Architect Syapse, Inc. |
|
From: Bryan T. <br...@sy...> - 2013-11-08 17:11:10
|
You would want to use a custom inference model that just had the exact rules you needed. Look at FullClosure or FastClosure and at InferenceEngine.Options. Bryan On 11/8/13 12:03 PM, "Jeremy J Carroll" <jj...@sy...> wrote: > >Hmmm - actually I should try enabling truth maintenance at some minimal >level and see what happens > > >Jeremy J Carroll >Principal Architect >Syapse, Inc. > > > >On Nov 8, 2013, at 8:50 AM, Jeremy J Carroll <jj...@sy...> wrote: > >> >> This message is highlighting a high-level issue to do with ALPPs versus >>materialized versions of the same query. >> >> yesterday I finished porting the final piece of the Syapse >>application's "normal user" functionality from our legacy knowledge base >>to bigdata. >> This piece was the facetted browser - which has a heavy dependency on >>some typing functionality, partial queries that I was writing as >> >> [A] ?object rdf:type / rdfs:subClassOf * ?class >> >> (this is a very small part of a big query that populates every cell of >>a facetted browse page) >> >> The performance of the initial cut was very significantly lower than >>the legacy system: I got a big boost by pulling in a recent change from >>Mike; but even so I was not in the right ball-park. >> >> On analysis the issue seemed to come down to the rdfs:subClassOf * >>expressions, and I can meet my performance expectations by materializing >>the reflexive transitive closure of this property so that the query >>becomes >> >> [B] ?object rdf:type / syapse:optimizedSubClassOf ?class >> >> (approx: I got a factor of 10 from Mike's changes and a further factor >>of maybe 5 from materializing) >> >> The architectural question is: >> >> - should the ALPP code actually do a materialization (which would need >>to be invalidated on update), probably controlled by an optimization >>hint, or by counting (e.g. if we call rdfs:subClassOf * sufficiently >>frequently compared with the updates then we should materialize) >> >> if it did, I imagine that the performance of the initial query [A] >>could approach that of the optimized query [B]. >> >> Arguments against (other than time and prioritization) are: >> - this optimization is better done by the end user (as I am doing), >>where it can be guided by application knowledge (which is true for me - >>syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *, e.g. >>it is only reflexive on classes, and only on those classes that I care >>about in the sort of query I am supporting) >> - the cache invalidation is also hard to get right in a general >>setting, whereas application level knowledge can make cache invalidation >>trivial (in the syapse application any change to the ontology is a >>pretty rare admin function, and we can invalidate all ontological caches >>for every change without any issue) >> >> Arguments for are - this is otherwise an improvement that is >>conceptually straightforward >> >> Jeremy J Carroll >> Principal Architect >> Syapse, Inc. >> >> >> > > >-------------------------------------------------------------------------- >---- >November Webinars for C, C++, Fortran Developers >Accelerate application performance with scalable programming models. >Explore >techniques for threading, error checking, porting, and tuning. Get the >most >from the latest Intel processors and coprocessors. See abstracts and >register >http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktr >k >_______________________________________________ >Bigdata-developers mailing list >Big...@li... >https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
|
From: Bryan T. <br...@sy...> - 2013-11-08 17:07:52
|
Running just that specific rule could help out a lot. This is an interesting direction. Not one we have experimented with. The only downside is that there needs to be an awareness on the developers side of what is materialized and what is not. It would be nicer to rewrite the property path out if it was known to be materialized. Bryan On 11/8/13 12:03 PM, "Jeremy J Carroll" <jj...@sy...> wrote: > >Hmmm - actually I should try enabling truth maintenance at some minimal >level and see what happens > > >Jeremy J Carroll >Principal Architect >Syapse, Inc. > > > >On Nov 8, 2013, at 8:50 AM, Jeremy J Carroll <jj...@sy...> wrote: > >> >> This message is highlighting a high-level issue to do with ALPPs versus >>materialized versions of the same query. >> >> yesterday I finished porting the final piece of the Syapse >>application's "normal user" functionality from our legacy knowledge base >>to bigdata. >> This piece was the facetted browser - which has a heavy dependency on >>some typing functionality, partial queries that I was writing as >> >> [A] ?object rdf:type / rdfs:subClassOf * ?class >> >> (this is a very small part of a big query that populates every cell of >>a facetted browse page) >> >> The performance of the initial cut was very significantly lower than >>the legacy system: I got a big boost by pulling in a recent change from >>Mike; but even so I was not in the right ball-park. >> >> On analysis the issue seemed to come down to the rdfs:subClassOf * >>expressions, and I can meet my performance expectations by materializing >>the reflexive transitive closure of this property so that the query >>becomes >> >> [B] ?object rdf:type / syapse:optimizedSubClassOf ?class >> >> (approx: I got a factor of 10 from Mike's changes and a further factor >>of maybe 5 from materializing) >> >> The architectural question is: >> >> - should the ALPP code actually do a materialization (which would need >>to be invalidated on update), probably controlled by an optimization >>hint, or by counting (e.g. if we call rdfs:subClassOf * sufficiently >>frequently compared with the updates then we should materialize) >> >> if it did, I imagine that the performance of the initial query [A] >>could approach that of the optimized query [B]. >> >> Arguments against (other than time and prioritization) are: >> - this optimization is better done by the end user (as I am doing), >>where it can be guided by application knowledge (which is true for me - >>syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *, e.g. >>it is only reflexive on classes, and only on those classes that I care >>about in the sort of query I am supporting) >> - the cache invalidation is also hard to get right in a general >>setting, whereas application level knowledge can make cache invalidation >>trivial (in the syapse application any change to the ontology is a >>pretty rare admin function, and we can invalidate all ontological caches >>for every change without any issue) >> >> Arguments for are - this is otherwise an improvement that is >>conceptually straightforward >> >> Jeremy J Carroll >> Principal Architect >> Syapse, Inc. >> >> >> > > >-------------------------------------------------------------------------- >---- >November Webinars for C, C++, Fortran Developers >Accelerate application performance with scalable programming models. >Explore >techniques for threading, error checking, porting, and tuning. Get the >most >from the latest Intel processors and coprocessors. See abstracts and >register >http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktr >k >_______________________________________________ >Bigdata-developers mailing list >Big...@li... >https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
|
From: Jeremy J C. <jj...@sy...> - 2013-11-08 17:04:06
|
Hmmm - actually I should try enabling truth maintenance at some minimal level and see what happens Jeremy J Carroll Principal Architect Syapse, Inc. On Nov 8, 2013, at 8:50 AM, Jeremy J Carroll <jj...@sy...> wrote: > > This message is highlighting a high-level issue to do with ALPPs versus materialized versions of the same query. > > yesterday I finished porting the final piece of the Syapse application's "normal user" functionality from our legacy knowledge base to bigdata. > This piece was the facetted browser - which has a heavy dependency on some typing functionality, partial queries that I was writing as > > [A] ?object rdf:type / rdfs:subClassOf * ?class > > (this is a very small part of a big query that populates every cell of a facetted browse page) > > The performance of the initial cut was very significantly lower than the legacy system: I got a big boost by pulling in a recent change from Mike; but even so I was not in the right ball-park. > > On analysis the issue seemed to come down to the rdfs:subClassOf * expressions, and I can meet my performance expectations by materializing the reflexive transitive closure of this property so that the query becomes > > [B] ?object rdf:type / syapse:optimizedSubClassOf ?class > > (approx: I got a factor of 10 from Mike's changes and a further factor of maybe 5 from materializing) > > The architectural question is: > > - should the ALPP code actually do a materialization (which would need to be invalidated on update), probably controlled by an optimization hint, or by counting (e.g. if we call rdfs:subClassOf * sufficiently frequently compared with the updates then we should materialize) > > if it did, I imagine that the performance of the initial query [A] could approach that of the optimized query [B]. > > Arguments against (other than time and prioritization) are: > - this optimization is better done by the end user (as I am doing), where it can be guided by application knowledge (which is true for me - syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *, e.g. it is only reflexive on classes, and only on those classes that I care about in the sort of query I am supporting) > - the cache invalidation is also hard to get right in a general setting, whereas application level knowledge can make cache invalidation trivial (in the syapse application any change to the ontology is a pretty rare admin function, and we can invalidate all ontological caches for every change without any issue) > > Arguments for are - this is otherwise an improvement that is conceptually straightforward > > Jeremy J Carroll > Principal Architect > Syapse, Inc. > > > |
|
From: Jeremy J C. <jj...@sy...> - 2013-11-08 17:03:11
|
Starting mid next week, I am aiming to spend 3 to 5 days on bigdata work - [after I have wrapped up the 'normal user' part of my Syapse port, before moving on the 'admin user' part. I need to complete the application level materialization (syapse:optimizedSubClassOf) discussed in my previous e-mail, and then do some moderately extensive tidying up] Of the trac items currently assigned me, my current priority is: 767 MINUS 769 update and alpp (needed for any sensible code for actually generating syapse:optimizedSubClassOf) (I suspect 768 quads and alpp can be done at the same time) 758 bds:search bds:search (maybe interacts with #581 Full text search AST optimizer does not work with nested subqueries) 759 multiple filters interfere Jeremy J Carroll Principal Architect Syapse, Inc. |
|
From: Jeremy J C. <jj...@sy...> - 2013-11-08 16:51:05
|
This message is highlighting a high-level issue to do with ALPPs versus materialized versions of the same query. yesterday I finished porting the final piece of the Syapse application's "normal user" functionality from our legacy knowledge base to bigdata. This piece was the facetted browser - which has a heavy dependency on some typing functionality, partial queries that I was writing as [A] ?object rdf:type / rdfs:subClassOf * ?class (this is a very small part of a big query that populates every cell of a facetted browse page) The performance of the initial cut was very significantly lower than the legacy system: I got a big boost by pulling in a recent change from Mike; but even so I was not in the right ball-park. On analysis the issue seemed to come down to the rdfs:subClassOf * expressions, and I can meet my performance expectations by materializing the reflexive transitive closure of this property so that the query becomes [B] ?object rdf:type / syapse:optimizedSubClassOf ?class (approx: I got a factor of 10 from Mike's changes and a further factor of maybe 5 from materializing) The architectural question is: - should the ALPP code actually do a materialization (which would need to be invalidated on update), probably controlled by an optimization hint, or by counting (e.g. if we call rdfs:subClassOf * sufficiently frequently compared with the updates then we should materialize) if it did, I imagine that the performance of the initial query [A] could approach that of the optimized query [B]. Arguments against (other than time and prioritization) are: - this optimization is better done by the end user (as I am doing), where it can be guided by application knowledge (which is true for me - syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *, e.g. it is only reflexive on classes, and only on those classes that I care about in the sort of query I am supporting) - the cache invalidation is also hard to get right in a general setting, whereas application level knowledge can make cache invalidation trivial (in the syapse application any change to the ontology is a pretty rare admin function, and we can invalidate all ontological caches for every change without any issue) Arguments for are - this is otherwise an improvement that is conceptually straightforward Jeremy J Carroll Principal Architect Syapse, Inc. |
|
From: Jeremy J C. <jj...@sy...> - 2013-11-08 16:35:34
|
This is great! It is definitely going to help me know (nearly as quickly as Bryan) when I introduce regression issues! Thanks Jeremy J Carroll Principal Architect Syapse, Inc. On Nov 6, 2013, at 6:52 PM, Peter Ansell <ans...@gm...> wrote: > Bryan, > > It is visible now. > > Thanks, > > Peter > > On 7 November 2013 09:46, Bryan Thompson <br...@sy...> wrote: >> Please try again. I made the jobs visible to anonymous users. >> Bryan >> >> On 11/6/13 5:27 PM, "Peter Ansell" <ans...@gm...> wrote: >> >>> http://ci.bigdata.com:8080 >> |
|
From: Bryan T. <br...@sy...> - 2013-11-07 13:16:02
|
I believe that the problem for Igor et al was that the jetty instance was limiting the request size, but I do not believe that the request limit was as little as 3500 bytes. I see this question as more about the correct semantics of the SPARQL UPDATE protocol requests. Bryan From: Mike Personick <mi...@sy...<mailto:mi...@sy...>> Date: Thursday, November 7, 2013 8:12 AM To: Bryan Thompson <br...@sy...<mailto:br...@sy...>>, "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: Re: [Bigdata-developers] Fwd: [bigdata - Help] RE: Sparql Update Request Refactor Isn't this the same problem that Igor et al were having – too big SPARQL Update requests? And they switched to a shared file system to get around it? From: Bryan Thompson <br...@sy...<mailto:br...@sy...>> Date: Wednesday, November 6, 2013 8:29 PM To: "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: [Bigdata-developers] Fwd: [bigdata - Help] RE: Sparql Update Request Refactor Any comment on this forum post? Bryan Begin forwarded message: From: SourceForge.net<http://SourceForge.net> <no...@so...<mailto:no...@so...>> Date: November 4, 2013 at 5:36:02 AM EST To: SourceForge.net<http://SourceForge.net> <no...@so...<mailto:no...@so...>> Subject: [bigdata - Help] RE: Sparql Update Request Refactor Read and respond to this message at: https://sourceforge.net/projects/bigdata/forums/forum/676946/topic/8801178 By: feugen24 Hi, I'm a bit confused. Both in bigdata docs and in the sample code you provided the update parameter is in the update [b]query[/b] string but in the sparql specs you have "Client requests for this operation must include exactly one SPARQL update request string (parameter name: update)" but post param not query param, as can be seen on sparql specs/ table "2.2 update operation" query strings params are "none" or "graph uris". Also none of the examples have "update" in query string. Also i'm working on version 1.2.3 and from[url]http://www.w3.org/TR/sparql11-protocol/#update-bindings-http-examples[/url] i can't get to work "3.2.2 UPDATE using POST directly" with returned error: "Content-Type not recognized as RDF: application/sparql-update". From that ticket it seems it was fixed in 1.2.2. "update via POST with URL-encoded parameters" works ok, so for now i'll use that version. _____________________________________________________________________________________ You are receiving this email because you elected to monitor this topic or entire forum. To stop monitoring this topic visit: https://sourceforge.net/projects/bigdata/forums/forum/676946/topic/8801178/unmonitor To stop monitoring this forum visit: https://sourceforge.net/projects/bigdata/forums/forum/676946/unmonitor |
|
From: Mike P. <mi...@sy...> - 2013-11-07 13:13:44
|
Isn't this the same problem that Igor et al were having – too big SPARQL Update requests? And they switched to a shared file system to get around it? From: Bryan Thompson <br...@sy...<mailto:br...@sy...>> Date: Wednesday, November 6, 2013 8:29 PM To: "Big...@li...<mailto:Big...@li...>" <Big...@li...<mailto:Big...@li...>> Subject: [Bigdata-developers] Fwd: [bigdata - Help] RE: Sparql Update Request Refactor Any comment on this forum post? Bryan Begin forwarded message: From: SourceForge.net<http://SourceForge.net> <no...@so...<mailto:no...@so...>> Date: November 4, 2013 at 5:36:02 AM EST To: SourceForge.net<http://SourceForge.net> <no...@so...<mailto:no...@so...>> Subject: [bigdata - Help] RE: Sparql Update Request Refactor Read and respond to this message at: https://sourceforge.net/projects/bigdata/forums/forum/676946/topic/8801178 By: feugen24 Hi, I'm a bit confused. Both in bigdata docs and in the sample code you provided the update parameter is in the update [b]query[/b] string but in the sparql specs you have "Client requests for this operation must include exactly one SPARQL update request string (parameter name: update)" but post param not query param, as can be seen on sparql specs/ table "2.2 update operation" query strings params are "none" or "graph uris". Also none of the examples have "update" in query string. Also i'm working on version 1.2.3 and from[url]http://www.w3.org/TR/sparql11-protocol/#update-bindings-http-examples[/url] i can't get to work "3.2.2 UPDATE using POST directly" with returned error: "Content-Type not recognized as RDF: application/sparql-update". From that ticket it seems it was fixed in 1.2.2. "update via POST with URL-encoded parameters" works ok, so for now i'll use that version. _____________________________________________________________________________________ You are receiving this email because you elected to monitor this topic or entire forum. To stop monitoring this topic visit: https://sourceforge.net/projects/bigdata/forums/forum/676946/topic/8801178/unmonitor To stop monitoring this forum visit: https://sourceforge.net/projects/bigdata/forums/forum/676946/unmonitor |
|
From: Peter A. <ans...@gm...> - 2013-11-07 02:52:32
|
Bryan, It is visible now. Thanks, Peter On 7 November 2013 09:46, Bryan Thompson <br...@sy...> wrote: > Please try again. I made the jobs visible to anonymous users. > Bryan > > On 11/6/13 5:27 PM, "Peter Ansell" <ans...@gm...> wrote: > >>http://ci.bigdata.com:8080 > |
|
From: Bryan T. <br...@sy...> - 2013-11-07 00:29:38
|
Any comment on this forum post? Bryan Begin forwarded message: From: SourceForge.net<http://SourceForge.net> <no...@so...<mailto:no...@so...>> Date: November 4, 2013 at 5:36:02 AM EST To: SourceForge.net<http://SourceForge.net> <no...@so...<mailto:no...@so...>> Subject: [bigdata - Help] RE: Sparql Update Request Refactor Read and respond to this message at: https://sourceforge.net/projects/bigdata/forums/forum/676946/topic/8801178 By: feugen24 Hi, I'm a bit confused. Both in bigdata docs and in the sample code you provided the update parameter is in the update [b]query[/b] string but in the sparql specs you have "Client requests for this operation must include exactly one SPARQL update request string (parameter name: update)" but post param not query param, as can be seen on sparql specs/ table "2.2 update operation" query strings params are "none" or "graph uris". Also none of the examples have "update" in query string. Also i'm working on version 1.2.3 and from[url]http://www.w3.org/TR/sparql11-protocol/#update-bindings-http-examples[/url] i can't get to work "3.2.2 UPDATE using POST directly" with returned error: "Content-Type not recognized as RDF: application/sparql-update". From that ticket it seems it was fixed in 1.2.2. "update via POST with URL-encoded parameters" works ok, so for now i'll use that version. _____________________________________________________________________________________ You are receiving this email because you elected to monitor this topic or entire forum. To stop monitoring this topic visit: https://sourceforge.net/projects/bigdata/forums/forum/676946/topic/8801178/unmonitor To stop monitoring this forum visit: https://sourceforge.net/projects/bigdata/forums/forum/676946/unmonitor |
|
From: Bryan T. <br...@sy...> - 2013-11-06 22:48:00
|
Please try again. I made the jobs visible to anonymous users. Bryan On 11/6/13 5:27 PM, "Peter Ansell" <ans...@gm...> wrote: >http://ci.bigdata.com:8080 |
|
From: Peter A. <ans...@gm...> - 2013-11-06 22:27:42
|
Hi Bryan, It doesn't look like the job is publicly visible right now. Cheers, Peter On 7 November 2013 07:53, Bryan Thompson <br...@sy...> wrote: > Sorry, make that http://ci.bigdata.com:8080. The runs are currently about > 1-1/2 hours each. The last one failed on a zookeeper cleanup. The current > CI run should go to completion. > Thanks, > Bryan > > From: Bryan Thompson <br...@sy...> > Date: Wednesday, November 6, 2013 3:49 PM > To: "Big...@li..." > <Big...@li...> > Subject: [Bigdata-developers] ci.bigdata.com > > We are standing up CI on a node (http://ci.bigdata.com) that will be visible > to everyone (read-only). Hopefully this will provide added transparency. I > am still working through the configuration of this service, but it is very > close to delivering good builds. > > Once CI is running smoothly on EC2, I will look at how to export the maven > artifacts generated by CI. I believe that we will be able to do this > through a plug-in, but that may mean that the marven artifact location will > change. I will look at this more tomorrow. > > Thanks, > Bryan > > ------------------------------------------------------------------------------ > November Webinars for C, C++, Fortran Developers > Accelerate application performance with scalable programming models. Explore > techniques for threading, error checking, porting, and tuning. Get the most > from the latest Intel processors and coprocessors. See abstracts and > register > http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |