Adding rules and tweaking inferencing

Help
2014-02-25
2014-03-01
  • Anton Kulaga
    Anton Kulaga
    2014-02-25

    I am trying to figure out how to add custom rules to bigdata. For me it is hard to understand form the code what is really happening there, I only managed to find getDatabase.getClosureInstance.getProgram() that seems to have a method for adding steps, should I use it for custom rules?
    Do rules work in quads mode? As I understand the reason why inference is switched off in quads mode is because it is hard to comeup with nice default settings for current OWL rules. But I do not care much about OWL rules right now, I am more interested in adding my own rules (for quads only) that will allow me to manage data better and that will use context (I will move data between contexts taking in most of the rules).

     
    • Bryan Thompson
      Bryan Thompson
      2014-03-01

      Here is that email about bigdata inference - it was on the developers list. Maybe you can summarize all of this into wiki form and attach it to that ticket on describing bigdata inference and then I will get it posted on the wiki?

      These is also a discussion about owlet and a bigdata integration on the developers list. I believe that you had been interested in integrating an owl reasoner as well?

      Bryan

      Antoni,

      I think that most of your points are accurate. RDFS+ was a term that Jim
      Hendler was using for a while. It predates the standardization of these
      inference profiles.

      • Bigdata places an emphasis on a subset of inference rules that support
        scalable applications and which are "interesting". A lot of the standard
        rules are not very useful. For example, the range/domain queries do not
        impose a constraint. Many "inferences" are best captured by annotating
        the data as it is loaded. The application very often knows exactly how to
        decorate the instances and can ensure that the relevant properties are in
        place. When this approach works, it is less costly than asking the
        database to compute or maintain those inferences.

      • The set of rules in the FastClosure or FullClosure program obviously
        effects the inferences that are drawn. There can be custom rules inserted
        into these classes.

      • There is backward chaining for "everything is a resource", at least in
        the Journal/HA deployment model. This is pretty much a useless inference
        and definitely is not one that should be materialized.

      • I do find search hits on the developers list. For example, this URI:

      https://sourceforge.net/search/index.php?group_id=191861&type_of_search=mli
      sts&q=RTO&ml_name[]=bigdata-developers&posted_date_start=&posted_date_end=&
      form_submit=Search

      Another theme that has come up several times with customers and recently
      on the forum are ways to compute the inferences independent of the target
      database for improved scaling, handling quads model inference using custom
      workflows, etc. I will try to touch briefly on these points.

      • Several customers use a pattern where they manage the inference in a
        temporary store or temporary journal. They compute the delta against the
        ground truth database, collect that data using a change log listener, and
        then apply the delta to the target database. This can be used to scale
        the inference workload independent of the query workload. It can also be
        used to partition the inference problem, either within the domain (this is
        often possible in real world applications) or across multiple triple store
        instances in a given database (e.g., multi-tenancy). This model can also
        work well with durable queues or map/reduce processes that feed updates
        into the database through an inference workflow. This can lead to very
        scalable design patterns.

      • The main reason why we do not support inference in quads mode is the
        question of which named graphs are the sources and the target for the
        ground triples and the inferred triples. You can use an inference
        workflow to make explicit application decisions about these issues.

      • You can use an inference workflow to partition the inference problem and
        scale inference independent of query for the highly available replication
        cluster. The HA cluster allows you to scale the query throughput
        linearly. By factoring out (and potentially partitioning) the inference
        workload, you not only remove a significant burden from the leader, but
        you can also scale the inference throughput independent of the query
        throughput if you can partition the inference problem.

      • The horizontally scaled architecture only supports database-at-once
        inference (versus incremental truth maintenance). You can use an
        inference workload to partition the inference problem and scale the
        inference problem independent of the triple store for scale-out.

      Thanks,
      Bryan

      On 1/29/14 4:13 AM, "Antoni Mylka" antoni.mylka@basis06.ch<mailto:antoni.mylka@basis06.ch> wrote:

      Hi,

      I've been trying to wrap my head around inference in Bigdata. This stuff
      is probably obvious to you. I've gathered my findings in seven yes/no
      statements. I would be very grateful for a true/false answer and maybe
      links to further docs.

      1. The only definition of "RDFS Plus" is in the book "Semantic Web for
        the working ontologist". It's not a standard by any means. The statement
        "Bigdata supports RDFS Plus" means "It is possible to configure bigdata
        to provide the kind of inference described in that book."

      2. The inference in Bigdata depends on exactly three things:

      3. the axioms - i.e. triples that are in the graph in the beginning and
        cannot be removed
      4. the closure - i.e. inference rules
      5. the InferenceEngine, is obtained from the AbstractTripleStore,
        contains additional configuration like "forwardChainRdfTypeRdfsResource"
        etc. The InferenceEngine config is read by the FastClosure and
        FullClosure classes.

      6. When I want to be sure that the inferencing goes according to my
        needs: I need to understand the meaning of exactly twelve configuration
        options and make sure they have correct values:

      com.bigdata.rdf.store.AbstractTripleStore.axiomsClass
      com.bigdata.rdf.store.AbstractTripleStore.closureClass
      ... all 10 properties defined in com.bigdata.rdf.rules.InferenceEngine

      1. The default settings of the above options (OwlAxioms, FastClosure,
        default InferenceEngine settings) yield a ruleset that is not formally
        defined anywhere. It's not full RDFS (e.g. rules RDFS4a and RDFS4b are
        disabled by default) nor OWL.

      2. If I want to follow some written standard and have all the RDFS
        entailment rules from
        http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#rules, or OWL 2 RL/RDF
        from
        http://www.w3.org/TR/owl2-profiles/#Reasoning_in_OWL_2_RL_and_RDF_Graphs_u
        sing_Rules I need to take care about it myself. There are no canned
        "standard" settings, that I could enable with a flick of a switch. If I
        need any of that, I'll need to set 12 configuration options and maybe
        even write my own Axioms and BaseClosure subclasses. The classes have to
        be wrapped in a jar, placed in WEB-INF/lib and shipped with my BigData
        distribution.

      3. When configuring inference - I need to understand the performance
        tradeoffs, and be sure that I really need ALL the rules. Every new rule
        means slower database. The default settings are fast and scalable.

      4. The only complete and authoritative documentation of ALL available
        Bigdata configuration options is in the code. I need to search for all
        interfaces named "Options" and see the javadocs of the constants there.
        Each constant X is accompanied by a DEFAULT_X constant with the default
        value.

      BTW: The mailing list search at
      http://sourceforge.net/search/?group_id=191861&type_of_search=mlists only
      covers bigdata-commit. I couldn't find any search for bigdata-developers.

      Best Regards

      --
      Antoni Myłka
      Software Engineer

      basis06 AG, Birkenweg 61, CH-3013 Bern<x-apple-data-detectors: 6=""> - Fon +41 31 311 32 22<tel:+41%2031%20311%2032%2022>
      http://www.basis06.chhttp://www.basis06.ch/ - source of smart business

      --------------------------------------------------------------------------

      WatchGuard Dimension instantly turns raw network data into actionable
      security intelligence. It gives you real-time visual feedback on key
      security issues and trends. Skip the complicated setup - simply import
      a virtual appliance and go from zero to informed in seconds.
      http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clkt
      rk


      Bigdata-developers mailing list
      Bigdata-developers@lists.sourceforge.netBigdata-developers@lists.sourceforge.net
      https://lists.sourceforge.net/lists/listinfo/bigdata-developers


      WatchGuard Dimension instantly turns raw network data into actionable
      security intelligence. It gives you real-time visual feedback on key
      security issues and trends. Skip the complicated setup - simply import
      a virtual appliance and go from zero to informed in seconds.
      http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk


      Bigdata-developers mailing list
      Bigdata-developers@lists.sourceforge.netBigdata-developers@lists.sourceforge.net
      https://lists.sourceforge.net/lists/listinfo/bigdata-developers

      On Feb 24, 2014, at 7:05 PM, "Anton Kulaga" antonkulaga@users.sf.net<mailto:antonkulaga@users.sf.net> wrote:

      I am trying to figure out how to add custom rules to bigdata. For me it is hard to understand form the code what is really happening there, I only managed to find getDatabase.getClosureInstance.getProgram() that seems to have a method for adding steps, should I use it for custom rules?
      Do rules work in quads mode? As I understand the reason why inference is switched off in quads mode is because it is hard to comeup with nice default settings for current OWL rules. But I do not care much about OWL rules right now, I am more interested in adding my own rules (for quads only) that will allow me to manage data better and that will use context (I will move data between contexts taking in most of the rules).


      Adding rules and tweaking inferencinghttps://sourceforge.net/p/bigdata/discussion/676946/thread/39d5492c/?limit=25#f2e4


      Sent from sourceforge.nethttp://sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
    • Bryan Thompson
      Bryan Thompson
      2014-03-01

      Rules do not execute currently in the quads mode. The rules are executed by the older APIs - pre QueryEngine. They are triple pattern specific.

      There was a very good summary of bigdata inferencing a few months ago. Either on this list or the forums. I will see if I can dig it up,

      Bryan

      On Feb 24, 2014, at 7:05 PM, "Anton Kulaga" antonkulaga@users.sf.net<mailto:antonkulaga@users.sf.net> wrote:

      I am trying to figure out how to add custom rules to bigdata. For me it is hard to understand form the code what is really happening there, I only managed to find getDatabase.getClosureInstance.getProgram() that seems to have a method for adding steps, should I use it for custom rules?
      Do rules work in quads mode? As I understand the reason why inference is switched off in quads mode is because it is hard to comeup with nice default settings for current OWL rules. But I do not care much about OWL rules right now, I am more interested in adding my own rules (for quads only) that will allow me to manage data better and that will use context (I will move data between contexts taking in most of the rules).


      Adding rules and tweaking inferencinghttps://sourceforge.net/p/bigdata/discussion/676946/thread/39d5492c/?limit=25#f2e4


      Sent from sourceforge.nethttp://sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
      Attachments