Menu

Optimal IRI Length

Help
2016-03-08
2016-03-18
  • Don Pellegrino

    Don Pellegrino - 2016-03-08

    With Blazegraph, is there an optimal length for IRIs?

    I am currently developing an ontology using the Stanford Protege Desktop tool. Based on the recommendation in [1], I would like to use unique identifiers for each IRI. Protege has the ability to auto-generate IDs. One option is to use globally unique identifiers. Options for generating these include specifing a prefix, suffix, and digit count. (http://protegewiki.stanford.edu/wiki/Protege4NamingAndRendering#New_entity_creation_preferences) The default digit count is 20.

    [1] Arp, R.; Smith, B.; Spear, A. D., Principles of Best Practice II: Terms, Definitions, and Classification. In Building Ontologies with Basic Formal Ontology, MIT Press: Cambridge, Massachusetts, 2015; pp 59-84.

     
    • Brad Bebee

      Brad Bebee - 2016-03-08

      Don,

      Thank you. One of the techniques that we use to get the best query and
      load performance is creating custom vocabularies with URI Inlining. Our
      2.0 release brought along several updates for Inlining such as
      fully-inlined UUID values and prefixed and suffixed integer URI patterns.
      With the prefix uri handlers, URIs that follow this form such as:
      http://rdf.ncbi.nlm.nih.gov/pubchem/compound/CID_1234234 can be inlined.

      This matters much more for instance data than for the ontology and typing
      data. Based on the options in the links, the prefix or suffix with
      numeric iterative or digit count would likely be the first choices.
      Pubchem, for example, has both types of URIs in the data sets. Internally,
      the URIHandlers will map the integer value to the smallest possible type,
      i.e. Short, Int, Long, that matches the value.

      Thanks, --Brad

      On Tue, Mar 8, 2016 at 10:39 AM, Don Pellegrino donpellegrino@users.sf.net
      wrote:

      With Blazegraph, is there an optimal length for IRIs?

      I am currently developing an ontology using the Stanford Protege Desktop
      tool. Based on the recommendation in [1], I would like to use unique
      identifiers for each IRI. Protege has the ability to auto-generate IDs. One
      option is to use globally unique identifiers. Options for generating these
      include specifing a prefix, suffix, and digit count. (
      http://protegewiki.stanford.edu/wiki/Protege4NamingAndRendering#New_entity_creation_preferences)
      The default digit count is 20.

      [1] Arp, R.; Smith, B.; Spear, A. D., Principles of Best Practice II:
      Terms, Definitions, and Classification. In Building Ontologies with
      Basic Formal Ontology
      , MIT Press: Cambridge, Massachusetts, 2015; pp
      59-84.


      Optimal IRI Length
      https://sourceforge.net/p/bigdata/discussion/676946/thread/44d33c84/?limit=25#62eb


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Don Pellegrino

    Don Pellegrino - 2016-03-18

    URI Inlining sounds like a nice optimization. Is there a specific regex for the pattern that is needed to use it? Based on the example above I assume <prefix>/ccc_nnnnnnn works, but is there a more general pattern at work?</prefix>

     

Log in to post a comment.