Here is a plausible path to a DL integration. There was an interesting talk that touched on this at CSHALS.
See http://trac.bigdata.com/ticket/824 (Examine owlet/ELK integration)
I'm glad you enjoyed Hilmar's talk. I had considered implementing owlet as a custom SERVICE as described here http://wiki.bigdata.com/wiki/index.php/FederatedQuery#Custom_Services. I imagine the user specifying a graph in the config, from which the ontology could be loaded into memory. It should also provide a choice of reasoner, such as ELK, Hermit, JFact, etc.
I think it would be more efficient than the current version, since you wouldn't have to generate the filter, serialize the query, send it to another endpoint, and parse. But the nice thing about the current version is that can work with any triplestore and any OWL API reasoner.
Alternatively I thought about "inverting" the current version to run as a server, so that you could call it as a remote SERVICE from a query to Bigdata. I wrote a little more about that here: https://github.com/phenoscape/owlet/wiki/Further-development-of-owlet
We have found owlet useful because we have very large, complex ontologies which we don't really use to auto-classify instances. The instance data is linked to ontology terms, but queries may involve complex DL descriptions taking advantage of the knowledge in the ontology.
Jim, I found the approach quite exciting and I would very much like to see this feature available out of the box with bigdata. I spoke with Hilmar about visiting at his location and having a conversation about how best to proceed.
We do a lot of things through the SERVICE mechanism and it does offer quite a bit of flexibility, including the ability to hook and leverage updates. It could be used to either embed a reasoner or to qreach out to a remote reasoner. The ASTOptimizer is another possible integration point.
I am curious how much RAM and CPU demand is imposed by the owl reasoner. It could make sense to either embed the reasoner or have it be an external service, depending on the resource demand. I am also curious how the deployment model might interact with those decisions. For example, if deployed with the HA cluster. It. It be interesting to try some different configurations and observe the impact on query performance.
On Feb 28, 2014, at 11:11 AM, "Jim Balhoff" firstname.lastname@example.org<mailto:email@example.com> wrote:
I'm glad you enjoyed Hilmar's talk. I had considered implementing owlet as a custom SERVICE as described here http://wiki.bigdata.com/wiki/index.php/FederatedQuery#Custom_Services. I imagine the user specifying a graph in the config, from the ontology could be loaded into memory. It should also provide a choice of reasoner, such as ELK, Hermit, JFact, etc.
owllet + ELK integrationhttps://sourceforge.net/p/bigdata/discussion/676946/thread/5e589b75/?limit=25#84ab
Sent from sourceforge.nethttp://sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
The reasoner is pretty demanding of RAM and CPU, although ELK works extremely quickly compared to some of the others. But all the OWL API reasoners hold the ontology in memory. If you only want to query the ontology itself, I think for the vast majority of ontologies you wouldn't need too much RAM, maybe a couple of gigabytes (lots less for many ontologies). But in our case our dataset itself is made up of thousands of complex class expressions which we want to find via DL queries. Holding it all in ELK takes around 6-10 GB RAM at the moment.
An issue to keep in mind is that I don't think that any of the available reasoners can answer simultaneous queries. ELK is safe to be called from multiple threads, but it will block until it answers each query in order.
The memory is not that worrying since higher memory machines could be used, but the high CPU utilization and single threaded query answering both suggest that it might be better to host this outside of the bigdata JVM. In order to support concurrent query, we might want to start an "ELK farm" so you could at least load balance the DL queries against multiple ELK instances.
From: Jim Balhoff firstname.lastname@example.org<mailto:email@example.com>
Reply-To: "[bigdata:discussion]" firstname.lastname@example.org<mailto:email@example.com>
Date: Thursday, March 13, 2014 3:14 PM
To: "[bigdata:discussion]" firstname.lastname@example.org<mailto:email@example.com>
Subject: [bigdata:discussion] owllet + ELK integration
owllet + ELK integrationhttps://sourceforge.net/p/bigdata/discussion/676946/thread/5e589b75/?limit=25#4d10
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.