|
From: Bryan T. <br...@sy...> - 2013-11-08 17:11:10
|
You would want to use a custom inference model that just had the exact rules you needed. Look at FullClosure or FastClosure and at InferenceEngine.Options. Bryan On 11/8/13 12:03 PM, "Jeremy J Carroll" <jj...@sy...> wrote: > >Hmmm - actually I should try enabling truth maintenance at some minimal >level and see what happens > > >Jeremy J Carroll >Principal Architect >Syapse, Inc. > > > >On Nov 8, 2013, at 8:50 AM, Jeremy J Carroll <jj...@sy...> wrote: > >> >> This message is highlighting a high-level issue to do with ALPPs versus >>materialized versions of the same query. >> >> yesterday I finished porting the final piece of the Syapse >>application's "normal user" functionality from our legacy knowledge base >>to bigdata. >> This piece was the facetted browser - which has a heavy dependency on >>some typing functionality, partial queries that I was writing as >> >> [A] ?object rdf:type / rdfs:subClassOf * ?class >> >> (this is a very small part of a big query that populates every cell of >>a facetted browse page) >> >> The performance of the initial cut was very significantly lower than >>the legacy system: I got a big boost by pulling in a recent change from >>Mike; but even so I was not in the right ball-park. >> >> On analysis the issue seemed to come down to the rdfs:subClassOf * >>expressions, and I can meet my performance expectations by materializing >>the reflexive transitive closure of this property so that the query >>becomes >> >> [B] ?object rdf:type / syapse:optimizedSubClassOf ?class >> >> (approx: I got a factor of 10 from Mike's changes and a further factor >>of maybe 5 from materializing) >> >> The architectural question is: >> >> - should the ALPP code actually do a materialization (which would need >>to be invalidated on update), probably controlled by an optimization >>hint, or by counting (e.g. if we call rdfs:subClassOf * sufficiently >>frequently compared with the updates then we should materialize) >> >> if it did, I imagine that the performance of the initial query [A] >>could approach that of the optimized query [B]. >> >> Arguments against (other than time and prioritization) are: >> - this optimization is better done by the end user (as I am doing), >>where it can be guided by application knowledge (which is true for me - >>syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *, e.g. >>it is only reflexive on classes, and only on those classes that I care >>about in the sort of query I am supporting) >> - the cache invalidation is also hard to get right in a general >>setting, whereas application level knowledge can make cache invalidation >>trivial (in the syapse application any change to the ontology is a >>pretty rare admin function, and we can invalidate all ontological caches >>for every change without any issue) >> >> Arguments for are - this is otherwise an improvement that is >>conceptually straightforward >> >> Jeremy J Carroll >> Principal Architect >> Syapse, Inc. >> >> >> > > >-------------------------------------------------------------------------- >---- >November Webinars for C, C++, Fortran Developers >Accelerate application performance with scalable programming models. >Explore >techniques for threading, error checking, porting, and tuning. Get the >most >from the latest Intel processors and coprocessors. See abstracts and >register >http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktr >k >_______________________________________________ >Bigdata-developers mailing list >Big...@li... >https://lists.sourceforge.net/lists/listinfo/bigdata-developers |