From: Rutger V. <rut...@gm...> - 2009-06-13 01:15:33
|
Hi, On Fri, Jun 12, 2009 at 5:23 PM, William Piel<wil...@ya...> wrote: > Thanks Rutger. This is really useful. > > Some questions: > > -- Regarding the "prism.startingPage" and "prism.endingPage", I think our > model stores these in one field (i.e. "123-132") -- I guess that means > splitting the field with some sort of regular expression -- e.g. > /^(\d+)[\s-\.]+(\d*)$/ -- unless prism also offers a combined "pages" > option. There is a pageRange property, which I've added. > -- In instances where an LSID exists (e.g. all taxonNamebankIDs have LSIDs), > would it be better to offer that, or stick with CDAO? W.r.t. the identifiers I'm the least pleased with what I'm suggesting. Now the identifiers are treated as TreeBASE specific (e.g. tb:taxonID). It's possible that these can be moved into CDAO, or, if objects have IDs doesn't seem to fit in with CDAO's mission of representing the core knowledge of phylogenetics (and IDs are more of an implementation detail) maybe they should be moved into a PhyloWS vocabulary? And should different classes of IDs have different syntax, e.g. a special predicate for LSIDs, versus namespaced IDs for other authorities (say, "TreeBASE:Tr1231", "Dryad:2324" etc.)? > -- I was, in a way, chagrined to see that a new "superset" of taxa is > available -- the GNI (http://globalnames.org/). They've essentially grabbed > all of uBio's data and added Species2000 and ZooBank to become a source of > names for EOL and GBIF, together with a names architecture > (http://gnapartnership.org/gna/wiki) that is under development. Given (a) > the similarity with uBio's mission, and (b) the fact that big money players > are involved while uBio seems to be languishing, it may be that this marks > the beginning of the end for uBio. And that may mean that some day a lot of > our taxon intel work will need to be rewritten. I only mention this in case > a bit of foresight, while designing our API terms, might help us adapt to a > future changing name informatics landscape. > -- I take it that separate dc.creator elements are created for each author: > is there a way to communicate author order? Actually, this is treated inconsistently in practice: I've seen multiple dc:creator annotations with one author each and I've seen them all concatenated within a single dc:creator annotation. I would like us to be as granular as possible so I'd favour the former. Alternatively, authors could be annotated using FOAF, so we can break it down in first/last/middle name, and add other contact info (email). > -- Is there a dc. or prism. for author email, abstract, or keywords? There is a prism.keyword (used as a set of atomic annotations) and dc.subject (best practice dictates this would be a comma-separated list of terms from a controlled vocab.). If we want to make available more about authors/editors perhaps we might use FOAF? > [Actually, I just realized that I was think about all this vocabulary > largely in terms of decorating returned NeXML with metadata rather than as > PhyloWS search terms. Of course people don't need to search on "email" > (etc)] Mmmm... maybe they do need to search on "email", I don't want to presume to know that :) Rutger > On Jun 12, 2009, at 7:45 PM, rut...@gm... wrote: > >> Hi, >> >> I'm sharing a google docs spreadsheet with you. It contains candidate >> search predicates we would like to expose through a TreeBASE web service >> interface. In addition, it contains the subjects they may apply to, the >> value space of the objects, where/how they would be expressed and retrieved >> in nexml and a short description of the application of each of these >> predicates. >> >> All implementation details aside, we imagine one should be able to search >> for example on dc.title='foo' and get a result set where the study titles >> match 'foo'. The list of predicates is a combination of dublin core/prism >> (for publication metadata) and a tb (TreeBASE) prefix. >> >> As a request for team CDAO, are any of the tb predicates in the >> spreadsheet concepts in CDAO? Could they be? >> >> To everyone else, please comment on the naming scheme. For example, it >> seems redundant to have taxonID and taxaID and treeID (etc.), on the other >> hand, it disambiguates the subject of the query. Should things be renamed? >> Does it make sense as is? >> >> Thanks, >> >> Rutger >> >> TreeBASE search predicates >> http://spreadsheets.google.com/ccc?key=rL--O7pyhR8FcnnG5-ofAlw > > > > > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Rutger V. <rut...@gm...> - 2009-06-13 02:04:55
|
Hi, On Fri, Jun 12, 2009 at 5:24 PM, Karen Cranston<kar...@gm...> wrote: > Can I make one initial request? Can we make this a little less TreeBASE > specific? I assume that we want to be friendly to other existing or future > databases of trees, so when we make this public, we may want to have a core > group of predicates that apply to trees in general and then examples of how > to extend to a specific implementation (e.g. the tb.matrixTB1ID, which is > pretty specific for this project). Absolutely 100% correct. I was hoping this discussion would start, because I think many of the predicates I now pushed into the tb: namespace can be moved up either to a PhyloWS vocabulary that defines generic search fields for phylogenetic web services (e.g. to lookup things by their IDs or labels) or even to CDAO (assuming the fields are relevant to CDAO's mission). Ideally only the distinction between the TreeBASE1 and TreeBASE2 identifiers would be something for the tb: vocabulary namespace. > There does seem to be a fair bit of redundancy, as well as labels whose > meaning aren't really that clear (the difference between treeKind and > treeType or matrixID and matrixLabel is not immediately obvious). treeKind and treeType are ambiguous, but that's how they are called in the treebase schema. The treeType is meant to indicate whether the tree is an atomic result (e.g. a single, optimal topology) or some kind of summary (e.g. a supertree, a consensus tree). TreeKind says something about what we assume the tips to mean (species or single sequences), which in turn says something about how the data are homologized. matrixID and matrixLabel should be obvious - they're just like taxonID and taxonLabel, or treeID and treeLabel etc. An ID is an identifier (e.g. "TreeBASE:M21313"), a Label is a human readable string (e.g. "Cytochrome B matrix, aligned using ClustalW"). > Does it > make more sense to split these into separate predicates? For example, have > matrix.xxx and tree.xxx. The matrix labels could then be used by projects > that only have data matrices (e.g. benchmark data sets for alignment or > phylogeny reconstruction) without having to worry about tree-specific terms. I think the goal is that we can run queries such as: select * from matrices where tb.matrixID='TreeBASE:M21313'; ...which implies that there is a vocabulary, identified by the tb prefix, that explains what a 'matrixID' is. I agree that it might seem less redundant to do something like: select * from matrices where matrix.id='TreeBASE:M21313'; ...but all that does is imply a separate vocabulary with a matrix prefix. This in turn implies that there would have to be vocabularies for matrix, taxon, taxa, tree, trees (etc.?) which would all have to be developed and maintained, and whose namespaces would need to be imported by whoever is formulating the query (imagine this as a SPARQL query, for example). I believe the end result is actually *more* redundancy and *longer* queries, so nothing much would be gained that way. > I'd like to see some simplification and renaming. Are there places where we > can make use of Darwin Core terms rather than defining new terms? DWC has a > datasetID as well as a whole pile of Taxon-related terms. Good idea, haven't looked at that yet. Thanks for your comments - let's keep this discussion going! Rutger > On Jun 12, 2009, at 4:45 PM, rut...@gm... wrote: > >> >> Hi, >> >> I'm sharing a google docs spreadsheet with you. It contains candidate >> search predicates we would like to expose through a TreeBASE web service >> interface. In addition, it contains the subjects they may apply to, the >> value space of the objects, where/how they would be expressed and >> retrieved >> in nexml and a short description of the application of each of these >> predicates. >> >> All implementation details aside, we imagine one should be able to search >> for example on dc.title='foo' and get a result set where the study titles >> match 'foo'. The list of predicates is a combination of dublin core/prism >> (for publication metadata) and a tb (TreeBASE) prefix. >> >> As a request for team CDAO, are any of the tb predicates in the >> spreadsheet >> concepts in CDAO? Could they be? >> >> To everyone else, please comment on the naming scheme. For example, it >> seems redundant to have taxonID and taxaID and treeID (etc.), on the other >> hand, it disambiguates the subject of the query. Should things be renamed? >> Does it make sense as is? >> >> Thanks, >> >> Rutger >> >> TreeBASE search predicates >> http://spreadsheets.google.com/ccc?key=rL--O7pyhR8FcnnG5-ofAlw >> >> --~--~---------~--~----~------------~-------~--~----~ >> You received this message because you are subscribed to the Google Groups >> "PhyloWS" group. >> To post to this group, send email to ph...@go... >> To unsubscribe from this group, send email to >> phy...@go... >> For more options, visit this group at >> http://groups.google.com/group/phylows?hl=en >> -~----------~----~----~----~------~----~------~--~--- >> > > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |