Re: [Treebase-devel] [PhyloWS] TreeBASE search predicates

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

On Fri, Jun 12, 2009 at 5:24 PM, Karen Cranston<kar...@gm...> wrote:
> Can I make one initial request? Can we make this a little less TreeBASE
> specific? I assume that we want to be friendly to other existing or future
> databases of trees, so when we make this public, we may want to have a core
> group of predicates that apply to trees in general and then examples of how
> to extend to a specific implementation (e.g. the tb.matrixTB1ID, which is
> pretty specific for this project).

Absolutely 100% correct. I was hoping this discussion would start,
because I think many of the predicates I now pushed into the tb:
namespace can be moved up either to a PhyloWS vocabulary that defines
generic search fields for phylogenetic web services (e.g. to lookup
things by their IDs or labels) or even to CDAO (assuming the fields
are relevant to CDAO's mission). Ideally only the distinction between
the TreeBASE1 and TreeBASE2 identifiers would be something for the tb:
vocabulary namespace.

> There does seem to be a fair bit of redundancy, as well as labels whose
> meaning aren't really that clear (the difference between treeKind and
> treeType or matrixID and matrixLabel is not immediately obvious).

treeKind and treeType are ambiguous, but that's how they are called in
the treebase schema. The treeType is meant to indicate whether the
tree is an atomic result (e.g. a single, optimal topology) or some
kind of summary (e.g. a supertree, a consensus tree). TreeKind says
something about what we assume the tips to mean (species or single
sequences), which in turn says something about how the data are
homologized.

matrixID and matrixLabel should be obvious - they're just like taxonID
and taxonLabel, or treeID and treeLabel etc. An ID is an identifier
(e.g. "TreeBASE:M21313"), a Label is a human readable string (e.g.
"Cytochrome B matrix, aligned using ClustalW").

> Does it
> make more sense to split these into separate predicates? For example, have
> matrix.xxx and tree.xxx. The matrix labels could then be used by projects
> that only have data matrices (e.g. benchmark data sets for alignment or
> phylogeny reconstruction) without having to worry about tree-specific terms.

I think the goal is that we can run queries such as:

select * from matrices where tb.matrixID='TreeBASE:M21313';

...which implies that there is a vocabulary, identified by the tb
prefix, that explains what a 'matrixID' is. I agree that it might seem
less redundant to do something like:

select * from matrices where matrix.id='TreeBASE:M21313';

...but all that does is imply a separate vocabulary with a matrix
prefix. This in turn implies that there would have to be vocabularies
for matrix, taxon, taxa, tree, trees (etc.?) which would all have to
be developed and maintained, and whose namespaces would need to be
imported by whoever is formulating the query (imagine this as a SPARQL
query, for example). I believe the end result is actually *more*
redundancy and *longer* queries, so nothing much would be gained that
way.

> I'd like to see some simplification and renaming. Are there places where we
> can make use of Darwin Core terms rather than defining new terms? DWC has a
> datasetID as well as a whole pile of Taxon-related terms.

Good idea, haven't looked at that yet.

Thanks for your comments - let's keep this discussion going!

Rutger

> On Jun 12, 2009, at 4:45 PM, rut...@gm... wrote:
>
>>
>> Hi,
>>
>> I'm sharing a google docs spreadsheet with you. It contains candidate
>> search predicates we would like to expose through a TreeBASE web service
>> interface. In addition, it contains the subjects they may apply to, the
>> value space of the objects, where/how they would be expressed and
>> retrieved
>> in nexml and a short description of the application of each of these
>> predicates.
>>
>> All implementation details aside, we imagine one should be able to search
>> for example on dc.title='foo' and get a result set where the study titles
>> match 'foo'. The list of predicates is a combination of dublin core/prism
>> (for publication metadata) and a tb (TreeBASE) prefix.
>>
>> As a request for team CDAO, are any of the tb predicates in the
>> spreadsheet
>> concepts in CDAO? Could they be?
>>
>> To everyone else, please comment on the naming scheme. For example, it
>> seems redundant to have taxonID and taxaID and treeID (etc.), on the other
>> hand, it disambiguates the subject of the query. Should things be renamed?
>> Does it make sense as is?
>>
>> Thanks,
>>
>> Rutger
>>
>> TreeBASE search predicates
>> http://spreadsheets.google.com/ccc?key=rL--O7pyhR8FcnnG5-ofAlw
>>
>> --~--~---------~--~----~------------~-------~--~----~
>> You received this message because you are subscribed to the Google Groups
>> "PhyloWS" group.
>> To post to this group, send email to ph...@go...
>> To unsubscribe from this group, send email to
>> phy...@go...
>> For more options, visit this group at
>> http://groups.google.com/group/phylows?hl=en
>> -~----------~----~----~----~------~----~------~--~---
>>
>
>

-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com