With Blazegraph, is there an optimal length for IRIs?
I am currently developing an ontology using the Stanford Protege Desktop tool. Based on the recommendation in [1], I would like to use unique identifiers for each IRI. Protege has the ability to auto-generate IDs. One option is to use globally unique identifiers. Options for generating these include specifing a prefix, suffix, and digit count. (http://protegewiki.stanford.edu/wiki/Protege4NamingAndRendering#New_entity_creation_preferences) The default digit count is 20.
[1] Arp, R.; Smith, B.; Spear, A. D., Principles of Best Practice II: Terms, Definitions, and Classification. In Building Ontologies with Basic Formal Ontology, MIT Press: Cambridge, Massachusetts, 2015; pp 59-84.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you. One of the techniques that we use to get the best query and
load performance is creating custom vocabularies with URI Inlining. Our
2.0 release brought along several updates for Inlining such as
fully-inlined UUID values and prefixed and suffixed integer URI patterns.
With the prefix uri handlers, URIs that follow this form such as: http://rdf.ncbi.nlm.nih.gov/pubchem/compound/CID_1234234 can be inlined.
This matters much more for instance data than for the ontology and typing
data. Based on the options in the links, the prefix or suffix with
numeric iterative or digit count would likely be the first choices.
Pubchem, for example, has both types of URIs in the data sets. Internally,
the URIHandlers will map the integer value to the smallest possible type,
i.e. Short, Int, Long, that matches the value.
With Blazegraph, is there an optimal length for IRIs?
I am currently developing an ontology using the Stanford Protege Desktop
tool. Based on the recommendation in [1], I would like to use unique
identifiers for each IRI. Protege has the ability to auto-generate IDs. One
option is to use globally unique identifiers. Options for generating these
include specifing a prefix, suffix, and digit count. ( http://protegewiki.stanford.edu/wiki/Protege4NamingAndRendering#New_entity_creation_preferences)
The default digit count is 20.
[1] Arp, R.; Smith, B.; Spear, A. D., Principles of Best Practice II:
Terms, Definitions, and Classification. In Building Ontologies with
Basic Formal Ontology, MIT Press: Cambridge, Massachusetts, 2015; pp
59-84.
By default, the vocabulary in (2.0) provides inline declarations for RDF,
RDFS, OWL, FOAF, SKOS, Dublin Core, XML Schema and openrdf [1]. You can
extend these with a custom vocabulary, which can definitely help get better
load and query results on specific data sets. We'll do an upcoming blog
post on this in the next quarter or so. You can take a look at some of the
existing vocabularies at [2].
URI Inlining sounds like a nice optimization. Is there a specific regex for the pattern that is needed to use it? Based on the example above I assume <prefix>/ccc_nnnnnnn works, but is there a more general pattern at work?</prefix>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
With Blazegraph, is there an optimal length for IRIs?
I am currently developing an ontology using the Stanford Protege Desktop tool. Based on the recommendation in [1], I would like to use unique identifiers for each IRI. Protege has the ability to auto-generate IDs. One option is to use globally unique identifiers. Options for generating these include specifing a prefix, suffix, and digit count. (http://protegewiki.stanford.edu/wiki/Protege4NamingAndRendering#New_entity_creation_preferences) The default digit count is 20.
[1] Arp, R.; Smith, B.; Spear, A. D., Principles of Best Practice II: Terms, Definitions, and Classification. In Building Ontologies with Basic Formal Ontology, MIT Press: Cambridge, Massachusetts, 2015; pp 59-84.
Don,
Thank you. One of the techniques that we use to get the best query and
load performance is creating custom vocabularies with URI Inlining. Our
2.0 release brought along several updates for Inlining such as
fully-inlined UUID values and prefixed and suffixed integer URI patterns.
With the prefix uri handlers, URIs that follow this form such as:
http://rdf.ncbi.nlm.nih.gov/pubchem/compound/CID_1234234 can be inlined.
This matters much more for instance data than for the ontology and typing
data. Based on the options in the links, the prefix or suffix with
numeric iterative or digit count would likely be the first choices.
Pubchem, for example, has both types of URIs in the data sets. Internally,
the URIHandlers will map the integer value to the smallest possible type,
i.e. Short, Int, Long, that matches the value.
Thanks, --Brad
On Tue, Mar 8, 2016 at 10:39 AM, Don Pellegrino donpellegrino@users.sf.net
wrote:
Brad, does the inlining happen automatically or does it need to be configured for specific forms of URI?
Thank you,
Jim
Jim,
By default, the vocabulary in (2.0) provides inline declarations for RDF,
RDFS, OWL, FOAF, SKOS, Dublin Core, XML Schema and openrdf [1]. You can
extend these with a custom vocabulary, which can definitely help get better
load and query results on specific data sets. We'll do an upcoming blog
post on this in the next quarter or so. You can take a look at some of the
existing vocabularies at [2].
Thanks, --Brad
[1] https://wiki.blazegraph.com/wiki/index.php/InlineIVs
[2]
https://github.com/blazegraph/database/tree/master/bigdata-core/bigdata-rdf/src/java/com/bigdata/rdf/vocab
On Tue, Mar 8, 2016 at 12:54 PM, Jim Balhoff balhoff@users.sf.net wrote:
URI Inlining sounds like a nice optimization. Is there a specific regex for the pattern that is needed to use it? Based on the example above I assume <prefix>/ccc_nnnnnnn works, but is there a more general pattern at work?</prefix>