In Taverna, we can currently tag workflow ports based on a single ontology I
believe (this is rather tersely described
And the user can reset the ontology to point somewhere else if needed. I
haven't checked if there's support for multiple ontologies yet.
I agree that there's a need for the ability to have both "canonical" and
"non-canonical" ontologies. For example, you might want to use both the
existing ontology that Taverna uses, and internal ontologies that your
company uses. Or you might want to have both biological and chemical
ontologies for use in running screens, calculating IC-50s, etc.
In most drug discovery companies, there's a lot of assay development that
goes on. This can be target identification assays, various types of
biomarker assays, or tox screens. Many of these assays involve the
development of new protocols and may result in new data types (and thus new
One of the little "weekend projects" that we're working on (outside of
Taverna) is a way of allowing scientists to load various types of structured
and unstructured assays and "tag" the data with semantic types via a parser
configuration. The user basically defines a parser configuration by
dragging a tag from an ontology onto a spreadsheet cell indicating that this
cell contains data of a given semantic type. The user saves the parser
configuration and uses it to load spreadsheets or other data files into the
database. When the data is loaded, the semantic types would be used to
automatically map the data into our existing data store, and would allow
workflows to run against new data without necessarily having to write new
workflows, or having to write new mappings between the datastore and data
files. Still very much a work in progress at this point.
I think there are some initial things in Taverna that we could do in terms
of semantic type checking:
- A basic type checking service might go through and verify that the
ports have the same URI, and flag the ports that have differences, or where
one or both ports are untagged.
- A look ahead service that would suggest possible connections (ala
BioMoby) should also be possible.
- An autoshimming service would also be useful to allow the user to
connect two ports if a shim exists for it.
As for subtype checking, I think that's a bit further off. But if you could
associate a class with a given node in an ontology, and a subclass with a
subnode in the ontology, then it might be possible to deal with some basic
casting from one type to another. (I'm taking Ingo a little more literally,
perhaps than he had intended -- but the approach might prove useful).
It would also be useful if there was some way to associate validators with
nodes in an ontology. This would allow users to verify that "even though
you say this is a SwissProt ID, does it really conform to the SwissProt
naming convention". The ontology administrator would be able to select from
a list of validators and provide various input parameters. For example, you
might have a regex validator that allows the user to specify a regular
expression used to validate the data.
I think these use cases could probably be implemented throughout the course
of the year (assuming there's enough community interest). Implementing
semantic reasoning engines is a little beyond my ken, so I can't really
comment on it.
On Wed, Jun 25, 2008 at 3:49 PM, M. Scott Marshall <marshall@...>
> Hi Tom,
> I start with semantic annotations being a given. The question is how?
> As you and Ingo pointed out, there are many problems that are well worth
> pondering once you've got the basis for semantic annotation. However,
> you don't have to tackle them in order to benefit from creating the
> necessary annotation facilities (which apparently already exist in T2?).
> Tom Oinn wrote:
> > I.H.C.Wassink@... wrote:
> >> I agree with Scott an Mark that semantic typing of ports can be very
> >> valuable, for, for example, checking of the workflow based on this type
> >> or (partially) automatic workflow composition (maybe in the BioMoby
> >> fashion, in which it can propose which services can be connected to the
> >> processes already present in the workflow.
> >> Semantic typing is, however, not straightforward. For example, how to
> >> deal with sub-typing? Can a PDB record containing the protein sequence,
> >> and the 3D coordinates, be used as a plain protein sequence? This can be
> >> solved by putting semantic types in an ontology, by using OWL, SA-WSDL.
> >> But how about the other way around, casting a super type to a sub type?
> >> This might introduce similar problems as casting in for example Java...
> > Hi Ingo,
> > We really run into two problems when it comes to doing this.
> > Firstly we work in an open world with an unbounded number of types, so
> > coming up with an ontology that describes all of them is clearly
> > impossible - any ontology based metadata would have to be descriptive
> > rather than proscriptive.
> That's why I think that it should be possible to supplement a set of
> centrally available 'built-in' semantic types with user-defined semantic
> types. The use of semantic types that I have in mind is that processors
> or workflow components can use the semantic type information to label
> diagrams, or decide on appropriate algorithms, or parameters to
> algorithms. In the case of knowledge provenance, the components could
> add annotations to the data that tell you, for example, what algorithm
> was used to produce the data, or the sources, services, etc. associated
> with a derived fact or assertion (example: protein-protein interaction
> from text mining).
> I think that the above implies the ability to refer to centrally-defined
> semantic types (in an official Taverna repository, could be a local
> in-memory repository that is created at startup) as well as user-defined
> semantic types in a user-defined location. If you can add an annotation
> that makes use of a particular repository/ontology, then you've got
> enough to get started. Compatibility checking could also be performed
> with off-the-shelf reasoning that could be made 'pluggable'.
> > Secondly, and more critically, the kind of transformation one
> > potentially could do with this is something our users don't appear to
> > want at runtime.
> > We see semantic annotation as being of huge use in understanding
> > workflow results when used to augment data provenance, and of use in
> > guiding the construction of the workflow in the first place. In general
> > though I doubt we'd ever want to perform 'in engine' data
> > transformations driven by it - even if the annotations were sufficient
> > the feedback we have is overwhelmingly on the side of 'don't touch my
> > data without asking' :)
> For my part, I am not suggesting transformations performed by the
> workflow engine. I suspect that you are thinking of automatic data
> integration/transformations between components (a sort of semantic shim).
> n.b. About units, there's a handy ontology for units from the University
> of Wageningen:
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> Taverna-hackers mailing list
> Developers Guide: http://www.mygrid.org.uk/usermanual1.7/dev_guide.html
> FAQ: http://www.mygrid.org.uk/wiki/Mygrid/TavernaFaq