Blazegraph (powered by bigdata) / Discussion / Help: Optimal IRI Length

Don Pellegrino - 2016-03-08

With Blazegraph, is there an optimal length for IRIs?

I am currently developing an ontology using the Stanford Protege Desktop tool. Based on the recommendation in [1], I would like to use unique identifiers for each IRI. Protege has the ability to auto-generate IDs. One option is to use globally unique identifiers. Options for generating these include specifing a prefix, suffix, and digit count. (http://protegewiki.stanford.edu/wiki/Protege4NamingAndRendering#New_entity_creation_preferences) The default digit count is 20.

[1] Arp, R.; Smith, B.; Spear, A. D., Principles of Best Practice II: Terms, Definitions, and Classification. In Building Ontologies with Basic Formal Ontology, MIT Press: Cambridge, Massachusetts, 2015; pp 59-84.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Brad Bebee - 2016-03-08
  
  Don,
  
  Thank you. One of the techniques that we use to get the best query and
  load performance is creating custom vocabularies with URI Inlining. Our
  2.0 release brought along several updates for Inlining such as
  fully-inlined UUID values and prefixed and suffixed integer URI patterns.
  With the prefix uri handlers, URIs that follow this form such as:
  http://rdf.ncbi.nlm.nih.gov/pubchem/compound/CID_1234234 can be inlined.
  
  This matters much more for instance data than for the ontology and typing
  data. Based on the options in the links, the prefix or suffix with
  numeric iterative or digit count would likely be the first choices.
  Pubchem, for example, has both types of URIs in the data sets. Internally,
  the URIHandlers will map the integer value to the smallest possible type,
  i.e. Short, Int, Long, that matches the value.
  
  Thanks, --Brad
  
  On Tue, Mar 8, 2016 at 10:39 AM, Don Pellegrino donpellegrino@users.sf.net
  wrote:
  
  With Blazegraph, is there an optimal length for IRIs?
  
  I am currently developing an ontology using the Stanford Protege Desktop
  tool. Based on the recommendation in [1], I would like to use unique
  identifiers for each IRI. Protege has the ability to auto-generate IDs. One
  option is to use globally unique identifiers. Options for generating these
  include specifing a prefix, suffix, and digit count. (
  http://protegewiki.stanford.edu/wiki/Protege4NamingAndRendering#New_entity_creation_preferences)
  The default digit count is 20.
  
  [1] Arp, R.; Smith, B.; Spear, A. D., Principles of Best Practice II:
  Terms, Definitions, and Classification. In Building Ontologies with
  Basic Formal Ontology, MIT Press: Cambridge, Massachusetts, 2015; pp
  59-84.
  
  Optimal IRI Length
  https://sourceforge.net/p/bigdata/discussion/676946/thread/44d33c84/?limit=25#62eb
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/bigdata/discussion/676946/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jim Balhoff - 2016-03-08

Brad, does the inlining happen automatically or does it need to be configured for specific forms of URI?

Thank you,
Jim

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Brad Bebee - 2016-03-09
  
  Jim,
  
  By default, the vocabulary in (2.0) provides inline declarations for RDF,
  RDFS, OWL, FOAF, SKOS, Dublin Core, XML Schema and openrdf [1]. You can
  extend these with a custom vocabulary, which can definitely help get better
  load and query results on specific data sets. We'll do an upcoming blog
  post on this in the next quarter or so. You can take a look at some of the
  existing vocabularies at [2].
  
  Thanks, --Brad
  
  [1] https://wiki.blazegraph.com/wiki/index.php/InlineIVs
  
  [2]
  https://github.com/blazegraph/database/tree/master/bigdata-core/bigdata-rdf/src/java/com/bigdata/rdf/vocab
  
  On Tue, Mar 8, 2016 at 12:54 PM, Jim Balhoff balhoff@users.sf.net wrote:
  
  Brad, does the inlining happen automatically or does it need to be
  configured for specific forms of URI?
  
  Thank you,
  Jim
  
  Optimal IRI Length
  https://sourceforge.net/p/bigdata/discussion/676946/thread/44d33c84/?limit=25#b97f
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/bigdata/discussion/676946/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Don Pellegrino - 2016-03-18

URI Inlining sounds like a nice optimization. Is there a specific regex for the pattern that is needed to use it? Based on the example above I assume <prefix>/ccc_nnnnnnn works, but is there a more general pattern at work?</prefix>

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Optimal IRI Length

Fast, scalable, robust graph database platform

Forums

Help

Optimal IRI Length

Optimal IRI Length

Fast, scalable, robust graph database platform

Forums

Help

Optimal IRI Length document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Optimal IRI Length