|
From: Jeremy J C. <jj...@sy...> - 2013-11-08 16:51:05
|
This message is highlighting a high-level issue to do with ALPPs versus materialized versions of the same query. yesterday I finished porting the final piece of the Syapse application's "normal user" functionality from our legacy knowledge base to bigdata. This piece was the facetted browser - which has a heavy dependency on some typing functionality, partial queries that I was writing as [A] ?object rdf:type / rdfs:subClassOf * ?class (this is a very small part of a big query that populates every cell of a facetted browse page) The performance of the initial cut was very significantly lower than the legacy system: I got a big boost by pulling in a recent change from Mike; but even so I was not in the right ball-park. On analysis the issue seemed to come down to the rdfs:subClassOf * expressions, and I can meet my performance expectations by materializing the reflexive transitive closure of this property so that the query becomes [B] ?object rdf:type / syapse:optimizedSubClassOf ?class (approx: I got a factor of 10 from Mike's changes and a further factor of maybe 5 from materializing) The architectural question is: - should the ALPP code actually do a materialization (which would need to be invalidated on update), probably controlled by an optimization hint, or by counting (e.g. if we call rdfs:subClassOf * sufficiently frequently compared with the updates then we should materialize) if it did, I imagine that the performance of the initial query [A] could approach that of the optimized query [B]. Arguments against (other than time and prioritization) are: - this optimization is better done by the end user (as I am doing), where it can be guided by application knowledge (which is true for me - syapse:optimizedSubClassOf is strictly less than rdfs:subClassOf *, e.g. it is only reflexive on classes, and only on those classes that I care about in the sort of query I am supporting) - the cache invalidation is also hard to get right in a general setting, whereas application level knowledge can make cache invalidation trivial (in the syapse application any change to the ontology is a pretty rare admin function, and we can invalidate all ontological caches for every change without any issue) Arguments for are - this is otherwise an improvement that is conceptually straightforward Jeremy J Carroll Principal Architect Syapse, Inc. |