SSWAP Wiki

Simple Semantic Web Architecture and Protocol

Brought to you by: sswap

protocol

Authors:

Attachments

SSWAProtocol.png (70132 bytes)

Protocol

Provider Resource Graph Subject Object

Why SSWAP?

The Semantic Web and the Web Ontology Language, OWL, are an approach to bringing semantics (i.e., context-appropriate "meaning") to the web. With "meaning" a computer could differentiate between the notion of a gene as a unit of inheritance and Gene as the first name of a famous actor or rock star. In this capacity, computers could help us find, aggregate, and integrate disparate information from across the web for the benefit of advanced science.

Besides using ad hoc heuristics to page scrape and infer meaning (an error prone exercise), presumably one could use something like NLP (Natural Language Processing) to allow computers to infer the meaning inherent in various web pages. But this turns out to be a gargantuan task, and today, despite both effort and success, we as a community are still a long way from being able to have computers "read and understand" web pages and human language. So another approach would be to formalize a set of core logical constructs that allow meaning to be expressed, rather like the way we express mathematical meaning in theorems and proofs. This scenario is more difficult for humans — because we rarely think and express ourselves in strict logical formalisms — but it does make it easier for us to program computers to extract meaning and make decisions: something we call discerning suitability-for-purpose.

OWL is the W3C standard for expressing meaning as formalized semantics on the web (viz., OWL Web Ontology Language Overview, OWL Web Ontology Language Guide). OWL enables this by allowing one to use a Description Logic to describe a web resource. A web resource could be web page, a data set, or even a service that accepts input and returns output. You could serialize (write-out) this logic in many forms; for OWL, recommended format is RDF/XML.

While OWL gives us a construct in which to express the logical relationship of resources to one another, it does not give us the actual terms to use. So while you can use OWL to express that REV7 is a gene involved in DNA repair in yeast, OWL by itself gives us neither the notion of a gene, DNA, repair, 'DNA repair', nor yeast, and of course it does not give us the actual web page on REV7 itself. We ourselves as user/developers make these terms.

OWL is easy. OWL is based on RDF (Resource Description Framework) and two helper technologies called RDFS (Resource Description Framework Schema) and XSD (XML Schema). RDF/RDFS is grounded on the notion of three basic entities: individuals, properties, classes. An individual (also called an instance) is a thing, a web resource. With only one exception, an individual is always a URI (a simple hyperlink); i.e., a web resource. Thus the implied semantics are that when making statements about individuals, we are making statements about whatever is represented by that hyperlink. (The one exception is the ability to refer to "anonymous resources". Such individuals can only be referenced within a single file. These individuals — being unlinked to any universally recognized address space — are called blank or anonymous nodes). An individual cannot be a literal: a string, or a primitive data type.

Given two individuals, we can establish one as the subject and the other as the object of a relationship between them. An individual could be both a subject and an object, and indeed, this is quite common when building complex statements. The relationship linking the subject to the object is a property or predicate, and the whole triple of subject-predicate-object is called a statement. So the statement:

http://db.yeastgenome.org/cgi-bin/locus.pl?locus=rev7

http://www.myWebSite.org/myPredicates/hasGeneOntologyAnnotation

http://db.yeastgenome.org/cgi-bin/GO/go.pl?goid=6281

says that the web page for REV7 at the Saccharomyces Genome Database (the individual representing REV7) has a 'hasGeneOntologyAnnotation' relationship to a data page about DNA repair, namely http://db.yeastgenome.org/cgi-bin/GO/go.pl?goid=6281. Notice how you are making statements about stuff on the web without ever having to edit, or reach into, the actual data held by others. This is powerful. Unlike subjects, objects of statements may be literals, so predicates can also link individuals with data-typed values, for example:

http://db.yeastgenome.org/cgi-bin/locus.pl?locus=rev7

http://www.myWebSite.org/myPredicates/hasName

"SGD REV7 Web page"

The third and last entity are classes. A class simply represents a set of individuals. The actual individuals themselves do not need to exist. So you can have a class of "DNARepairGenes" or "MyFavoriteCartoonCharacters". A class allows you to specify properties that are or must be present for any individual to be a member of that set. In SSWAP, semantic web services themselves are the individuals of both protocol and user-defined classes.

The combination of individuals, their properties, and classes allows us to use reasoners (logic engines, similar to automated theorem provers) to infer logically implied statements that are true, but may not otherwise be explicitly apparent. This is what makes OWL so much more powerful than simply web service specifications using SOAP or WSDL. Because in OWL (specifically OWL DL, see below), we not just using syntactical conventions to embed meaning in a sequence of tokens, we are embedding meaning that maps to a description logic formalization, so reasoners can infer new knowledge from the logical consequences of statements. This inference of new knowledge is what Kant called synthetic a priori judgements in his famous example that '7 + 5' is not the same as '12'. "Twelve" is logically derived new knowledge from '7 + 5', as you can prove to yourself by finishing the statement: '7593748939039374583490345 + 9347858383745834939384784 = '. Even though the answer is logically embedded within the statement, actually executing the mathematical (logical) operator and recognizing the answer is new knowledge, since the types of actions you can do in this world differ on whether you know the left-hand or right-hand side of the equality. The bottom line is that reasoners can be powerful engines to find truths that are necessarily true, but are otherwise hidden from us.

Reasoning is hard. Quickly we can start asking higher-order questions that involve sets of sets, or properties of properties, that, depending on the knowledge base, can be difficult or even impossible for a reasoner to solve. To address this, OWL is partitioned into levels and profiles. The most expressive level is OWL Full. In worse case scenarios you can write statements in OWL Full that no algorithm may be able to ever fully solve, if by solve, we mean, "Get me all the logical implications of these statements in finite time with finite resources." A slightly less expressive level is called OWL DL ("DL" for Description Logic). In OWL DL we are guaranteed "computational completeness (all entailments are guaranteed to be computed) and decidability (all computations will finish in finite time)" (OWL Web Ontology Language Guide). SSWAP uses OWL DL. Current W3C focus is on OWL DL, with OWL 2 profiles of OWL 2 EL, OWL 2 QL, and OWL 2 RL aimed at improving and fine-tuning the expressivity/performance trade-off.

A (finite) set terms is called a controlled vocabulary. A set of terms where some terms describe the relationships of other terms to each other is called an ontology. SSWAP (Simple Semantic Web Architecture and Protocol) is an ontology specifically designed to allow web resources to describe themselves; to enable you to query for those resources; to engage them; and to semantically encode the result. Because SSWAP enables this interoperability (it defines a semantic hand-shaking, or rules of engagement), it is called a protocol. SSWAP is a protocol for semantic web services: SSWAP gives everyone a common set of terms with specific meaning to allow you to describe and engage in discovering and sharing data and services. Because SSWAP uses OWL, SSWAP resources are amenable to reasoning. SSWAP defines terms such as what it means to be a web resource, who provides that resource, and how the resource maps its input to its output. SSWAP does not define the particulars of the resource — such as if 'gene' stands for REV7 or Gene Hackman or Gene Simmons — but it enables you to do so. SSWAP is aimed at being lightweight (it defines only a few core classes and properties), so instead of setting it's own rules for authentication, security, service integrity, etc., it is designed to ride on top of existing protocols that already address these issues, such as HTTP and HTTPS.

As will become clear, a major hindrance to using large ontologies for semantic web services is that often those ontologies are described in their entirety in a single, monolithic file. This creates problems for semantic web services, because terms within the file are addressed by using the fragment identifier (#). Fragment identifiers are client-side locators; they are not guaranteed to be sent to web servers (www.w3.org/TR/webarch/#media-type-fragid, www.w3.org/Addressing/URL/4_2_Fragments.html, www.w3.org/TR/webarch/#fragid, www.w3.org/DesignIssues/Fragment). So, for example, a call for three terms in a 10 MB ontology could result in the web server sending the entire file three times. For this reason, ontologies used by SSWAP often establish each term in its own file and then uses OWL to join the terms logically (instead of physically) into a larger ontology. We emphasize this by using the conventional RDF/XML syntax 'sswap:someTerm' where sswap is the base URL and someTerm is a file on the path with someTerm's OWL definition.

Click on the links above for details on the classes and properties of the protocol.