From: Rutger V. <rut...@gm...> - 2009-07-02 23:10:13
|
Hi, I've implemented nexml export on treebase2, and made the serializer attach predicates from this list (http://spreadsheets.google.com/pub?key=rL--O7pyhR8FcnnG5-ofAlw) in the indicated locations. The predicates with asterisks can be used as search predicates through the PhyloWS architecture as described here: http://localhost:8080/treebase-web/help/urlAPI.jsp In addition, CQL searches can be tried out on the main search tabs by clicking the "Advanced search..." links, e.g. see: http://localhost:8080/treebase-web/search/studySearch.html Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Rutger V. <rut...@gm...> - 2009-07-03 02:55:35
|
erm, please substitute 8ball.sdsc.edu:6666 for localhost:8080 in the examples below. On Thu, Jul 2, 2009 at 4:10 PM, Rutger Vos<rut...@gm...> wrote: > Hi, > > I've implemented nexml export on treebase2, and made the serializer > attach predicates from this list > (http://spreadsheets.google.com/pub?key=rL--O7pyhR8FcnnG5-ofAlw) in > the indicated locations. The predicates with asterisks can be used as > search predicates through the PhyloWS architecture as described here: > http://localhost:8080/treebase-web/help/urlAPI.jsp > > In addition, CQL searches can be tried out on the main search tabs by > clicking the "Advanced search..." links, e.g. see: > http://localhost:8080/treebase-web/search/studySearch.html > > Rutger > > -- > Dr. Rutger A. Vos > Department of zoology > University of British Columbia > http://www.nexml.org > http://rutgervos.blogspot.com > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2009-07-03 16:14:57
|
cool stuff. I notice that this departs a bit from the phylows that is proposed here. For example, the proposed phylows puts "/find/" before "/ tree/", whereas you have it the other way. And the other major difference is that the proposed phylows suggest that to search on trees you do something like: /phylows/find/tree/?name=Primates whereas you are implementing: /phylows/taxon/find/?name=Primates&recordSchema=tree Your method is probably better and clearer -- in that it makes more sense that we're doing a find on a taxon with the result being a tree (only the taxon label is inherently part of the tree object), but perhaps we should get the other PhyloWS developers in agreement (i.e. Ryan Scherle), and then modify the wiki accordingly. I notice that while the following produces a hit of one record: http://8ball.sdsc.edu:6666/treebase-web/phylows/taxon/find?query=tb.title.taxon==Homo ...yet I'm unable to get any results via rss: http://8ball.sdsc.edu:6666/treebase-web/phylows/taxon/find?query=tb.title.taxon==Homo&format=rss1 Is my syntax incorrect? Also, I believe that this should give me a list of trees: http://8ball.sdsc.edu:6666/treebase-web/phylows/taxon/find?query=tb.title.taxon==Homo&recordSchema=tree but instead it gives me a list of taxa. Perhaps my syntax is wrong? bp On Jul 2, 2009, at 9:40 PM, Rutger Vos wrote: > erm, please substitute 8ball.sdsc.edu:6666 for localhost:8080 in the > examples below. > > On Thu, Jul 2, 2009 at 4:10 PM, Rutger Vos<rut...@gm...> > wrote: >> Hi, >> >> I've implemented nexml export on treebase2, and made the serializer >> attach predicates from this list >> (http://spreadsheets.google.com/pub?key=rL--O7pyhR8FcnnG5-ofAlw) in >> the indicated locations. The predicates with asterisks can be used as >> search predicates through the PhyloWS architecture as described here: >> http://localhost:8080/treebase-web/help/urlAPI.jsp >> >> In addition, CQL searches can be tried out on the main search tabs by >> clicking the "Advanced search..." links, e.g. see: >> http://localhost:8080/treebase-web/search/studySearch.html >> >> Rutger >> >> -- >> Dr. Rutger A. Vos >> Department of zoology >> University of British Columbia >> http://www.nexml.org >> http://rutgervos.blogspot.com >> |
From: Rutger V. <rut...@gm...> - 2009-07-03 22:17:05
|
Hi Bill, glad you like it. I think I will use this on one of the days in Lisbon to have students download data and process it. On Fri, Jul 3, 2009 at 9:13 AM, William Piel<wil...@ya...> wrote: > cool stuff. > I notice that this departs a bit from the phylows that is proposed here. > For example, the proposed phylows puts "/find/" before "/tree/", whereas > you have it the other way. And the other major difference is that the > proposed phylows suggest that to search on trees you do something like: > /phylows/find/tree/?name=Primates > whereas you are implementing: > /phylows/taxon/find/?name=Primates&recordSchema=tree The former, "standard" way to me seems very ambiguous. I would interpret it to mean the name of the tree, not of a taxon in the tree. > Your method is probably better and clearer -- in that it makes more sense > that we're doing a find on a taxon with the result being a tree (only the > taxon label is inherently part of the tree object), but perhaps we should > get the other PhyloWS developers in agreement (i.e. Ryan Scherle), and then > modify the wiki accordingly. > I notice that while the following produces a hit of one record: > http://8ball.sdsc.edu:6666/treebase-web/phylows/taxon/find?query=tb.title.taxon==Homo > ...yet I'm unable to get any results via rss: > http://8ball.sdsc.edu:6666/treebase-web/phylows/taxon/find?query=tb.title.taxon==Homo&format=rss1 > Is my syntax incorrect? I don't know - I *am* getting an rss feed with a single item returned. Maybe you should "view source" to see it? > Also, I believe that this should give me a list of trees: > http://8ball.sdsc.edu:6666/treebase-web/phylows/taxon/find?query=tb.title.taxon==Homo&recordSchema=tree > but instead it gives me a list of taxa. Perhaps my syntax is wrong? The recordSchema switch is only used in combination with format=rss1, the thinking being that the web interface behaviour should stay the same (we can switch tabs anyway to project a result set into a different context) but for programmatic access we do need recordSchema (because - no tabs). Cheers, Rutger > On Jul 2, 2009, at 9:40 PM, Rutger Vos wrote: > > erm, please substitute 8ball.sdsc.edu:6666 for localhost:8080 in the > examples below. > > On Thu, Jul 2, 2009 at 4:10 PM, Rutger Vos<rut...@gm...> wrote: > > Hi, > > I've implemented nexml export on treebase2, and made the serializer > > attach predicates from this list > > (http://spreadsheets.google.com/pub?key=rL--O7pyhR8FcnnG5-ofAlw) in > > the indicated locations. The predicates with asterisks can be used as > > search predicates through the PhyloWS architecture as described here: > > http://localhost:8080/treebase-web/help/urlAPI.jsp > > In addition, CQL searches can be tried out on the main search tabs by > > clicking the "Advanced search..." links, e.g. see: > > http://localhost:8080/treebase-web/search/studySearch.html > > Rutger > > -- > > Dr. Rutger A. Vos > > Department of zoology > > University of British Columbia > > http://www.nexml.org > > http://rutgervos.blogspot.com > > > > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: William P. <wil...@ya...> - 2009-07-04 01:25:21
|
On Jul 3, 2009, at 6:16 PM, Rutger Vos wrote: >> I notice that while the following produces a hit of one record: >> http://8ball.sdsc.edu:6666/treebase-web/phylows/taxon/find?query=tb.title.taxon==Homo >> ...yet I'm unable to get any results via rss: >> http://8ball.sdsc.edu:6666/treebase-web/phylows/taxon/find?query=tb.title.taxon==Homo&format=rss1 >> Is my syntax incorrect? > > I don't know - I *am* getting an rss feed with a single item returned. > Maybe you should "view source" to see it? Ah... indeed, it works for FireFox and Camino, but it does not work for Safari (says "zero articles"). >> Also, I believe that this should give me a list of trees: >> http://8ball.sdsc.edu:6666/treebase-web/phylows/taxon/find?query=tb.title.taxon==Homo&recordSchema=tree >> but instead it gives me a list of taxa. Perhaps my syntax is wrong? > > The recordSchema switch is only used in combination with format=rss1, > the thinking being that the web interface behaviour should stay the > same (we can switch tabs anyway to project a result set into a > different context) but for programmatic access we do need recordSchema > (because - no tabs). Ok. Although perhaps this could be a low-priority feature to be added later. (I can imagine this being a useful feature for web sites like tolweb.org and eol.org, in which for each species page they can have a simple hyperlink called "trees in TreeBASE with taxon x" -- thus avoiding users to have to make another mouse-click on a tab once they get to TreeBASE. I can't figure out why your rss does not work in Safari. For example, these two urls produce, more or less, the same content since they are making the same query: http://8ball.sdsc.edu:6666/treebase-web/phylows/taxon/find?query=tb.title.taxon==Homo&recordSchema=tree&format=rss1 http://purl.org/phylo/treebase/phylows/find/tree/?query=taxon_name+any+Homo&operation=searchRetrieve&recordSchema=pc ...by yours says "0 articles" in Safari while mine says "25 articles". Could you try adding "<?xml version="1.0" encoding="utf-8"?>" as a header? Thats the only substantive difference between the two, as far as I can tell. (also, it would be cool if yours included some other human-readable metadata -- like tree name, tree title, article citation, etc -- just a little synopsis so that people can use this in an RSS client) bp |
From: Hilmar L. <hl...@ne...> - 2009-07-04 17:20:07
|
On Jul 4, 2009, at 3:24 AM, William Piel wrote: > Could you try adding "<?xml version="1.0" encoding="utf-8"?>" as a > header? BTW the <?xml> line is required to be present for it to be valid XML. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Hilmar L. <hl...@ne...> - 2009-07-04 17:20:14
|
On Jul 3, 2009, at 6:13 PM, William Piel wrote: > cool stuff. I agree - the API is definitely heading in the right direction. I suggest some tweaks: > > I notice that this departs a bit from the phylows that is proposed > here. For example, the proposed phylows puts "/find/" before "/ > tree/", whereas you have it the other way. Right, this is not in compliance with the spec. find/ comes first as it changes the resource from a record and its URI to a finder. I.e., although it's possible that we change the spec, I don't see the reason that would justify that. In general, note that REST APIs at present aren't formally declared in a descriptor document that a general purpose validator could use and validate compliance. So really extra care needs to be taken to comply with the spec, or otherwise it's not a spec but a loose prescription. It seems like we should also implement a PhyloWS validator that uncovers violations quickly. > And the other major difference is that the proposed phylows suggest > that to search on trees you do something like: > > /phylows/find/tree/?name=Primates > > whereas you are implementing: > > /phylows/taxon/find/?name=Primates&recordSchema=tree Note BTW that a taxon finder is a custom addition to the API. Which is fine in principle, except that I'd suggest you conform to the pattern in the API spec and put find/ first. Also, find/taxon/ would imply that you are finding (and returning) taxa, which if I understand correctly is not the case - rather it seems you have one query parameter in the URI path (namely that you are searching by taxon?) and one in the query string. So if this is searching trees, it needs to be find/tree/, and if you are matching against taxon names, the query parameter needs to be tb.taxon.name or whatever the blessed metadata term for this purpose is. Third, recordSchema=tree means that you want records back in the tree schema. Unless you have invented that schema meanwhile, this is in all likelihood not what you want. Rather, the value should be nexml I suppose. find/tree already implies that you are finding (and returning) trees, so there is no point in expressing that redundantly in the query string. You might want to specify that you only want the tree and not also the matrix, but that would be a separate query parameter and should not be confounded with the return format. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Rutger V. <rut...@gm...> - 2009-07-08 04:54:58
Attachments:
treebase.owl
|
> You aren't trying to suggest that dcterms.title or dcterms.identifier should > mean different things for different finders, right? I kind of am: in find/study, dcterms.identifier is a study ID, in find/tree, dcterms.identifier is a tree ID. Internally, the finders traverse a CQL parse tree and translate these predicates into more refined subproperties (tb.identifier.study and tb.identifier.tree, respectively). In other words, if a tree is the subject, then the predicate dcterms.identifier is interpreted as the refined subproperty tb.identifier.tree. By the way, I made a simple ontology (attached) that formalizes this inheritance. Would be nice to have this available as http://purl.org/phylo/treebase/terms# or whatever (speaking of which: have you had a chance to add me to the treebase & phylows purl domains?) Seems to me that's pretty much in line with the Contextual part of CQL - I've seen many examples using dublin core predicates whose exact semantics are context-dependent. (By the way 2, you're saying "*should* mean different things for different finders". I don't know whether they *should*, but that's certainly how they are implemented now.) > What we should pay attention to though is that the API *allows* optimizing > of code reuse and clean design of implementations. Are you saying that it > stands in the way of that, and if so, how does it prevent clean design of > implementations? I think it stands in the way of clean design because any finder (find/tree, find/matrix, find/study) potentially needs to process predicates from any other domain (e.g. find/tree apparently needs to know about study IDs), which is harder than just having to deal with your own domain and subsequently having to project your result set into a different domain. Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Hilmar L. <hl...@ne...> - 2009-07-08 05:43:11
|
On Jul 7, 2009, at 8:58 PM, Rutger Vos wrote: >> You aren't trying to suggest that dcterms.title or >> dcterms.identifier should >> mean different things for different finders, right? > > I kind of am: in find/study, dcterms.identifier is a study ID, in > find/tree, dcterms.identifier is a tree ID. I think that's a very bad idea. It defeats the purpose of a controlled vocabulary (let alone ontology) to formalize unambiguously what we mean, and that we mean the same thing when we use the same term in the same application. > Internally, the finders traverse a CQL parse tree and translate > these predicates into more refined subproperties > (tb.identifier.study and tb.identifier.tree, > respectively). In other words, if a tree is the subject, then the > predicate dcterms.identifier is interpreted as the refined subproperty > tb.identifier.tree. To me this is backwards to how an ontology works. You would use the refined sub-properties, and if an agent doesn't understand what to do with it it would use the ontology to get at a more general term which it might recognize. In RDF and OWL properties don't change their meaning based on subject or object. Rather, subject and object can change their semantics by applying a property (that has range or domain defined) to them. > By the way, I made a simple ontology (attached) that formalizes this > inheritance. What I can see is that they are declared as subproperty of dc.identifier. They make no assertions about range or domain, no? > I've seen many examples using dublin core predicates whose exact > semantics are context-dependent. Yes, but not within the same application profile (metadata vocabulary), right? > >> What we should pay attention to though is that the API *allows* >> optimizing >> of code reuse and clean design of implementations. Are you saying >> that it >> stands in the way of that, and if so, how does it prevent clean >> design of >> implementations? > > I think it stands in the way of clean design because any finder > (find/tree, find/matrix, find/study) potentially needs to process > predicates from any other domain (e.g. find/tree apparently needs to > know about study IDs But that is only true for TreeBASE. That one finder implementation in TreeBASE should not cooperate with another finder implementation in TreeBASE is your design decision, not one from PhyloWS, right? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |
From: Rutger V. <rut...@gm...> - 2009-07-08 05:43:00
|
Hi Hilmar, all, thanks for your comments! >> I notice that this departs a bit from the phylows that is proposed here. >> For example, the proposed phylows puts "/find/" before "/tree/", whereas >> you have it the other way. > > Right, this is not in compliance with the spec. find/ comes first as it > changes the resource from a record and its URI to a finder. Right, switching that around is fairly trivial, so I'll do that. > Also, find/taxon/ would imply that you are finding (and returning) taxa, > which if I understand correctly is not the case - rather it seems you have > one query parameter in the URI path (namely that you are searching by > taxon?) and one in the query string. So if this is searching trees, it needs > to be find/tree/, and if you are matching against taxon names, the query > parameter needs to be tb.taxon.name or whatever the blessed metadata term > for this purpose is. > > Third, recordSchema=tree means that you want records back in the tree > schema. Unless you have invented that schema meanwhile, this is in all > likelihood not what you want. Rather, the value should be nexml I suppose. > find/tree already implies that you are finding (and returning) trees, so > there is no point in expressing that redundantly in the query string. You > might want to specify that you only want the tree and not also the matrix, > but that would be a separate query parameter and should not be confounded > with the return format. Mmmmm... I think this warrants a little more discussion. It's probably true that for most implementors their searches can be conveniently decomposed into several domains (tree search/matrix search/taxon search/etc.) and that for each domain the metaphor is that of searching a single table where the CQL indices are that table's columns. Then, within each domain there is a limited number of concerns: how to search on the provided indices and how to format the results. For example, for a search like http://8ball.sdsc.edu:6666/treebase-web/search/studySearch.html?query=dcterms.identifier=S2484&format=rss1&recordSchema=tree the implementation is thus: * there is a self-contained study searcher * the searcher knows how predicates map onto columns in the study table (e.g. dcterms.identifier is the same as study.id) * the searcher knows how to unpack a study object and get the trees out if instead we'd have phylows/tree/find?query=study.identifier=S2484, the implementation would be something like: * there is a tree searcher * the tree searcher needs to know not just about the tree table but also about how all other predicates map onto all other tables, and how they join with the tree table * the tree searcher needs to know how to traverse study objects and where trees are inside the study object * (and similar overlap of concerns becomes necessary if we want the trees for a given matrix, or for a taxon, or what have you) To me that seems like bad design. We'll lose any separation of concern and might end up with a lot of redundancy between searchers - and a lot more code (and bugs) to write. I realize that I'm overloading the "recordSchema" token (and should fix that) but some way of saying "search THIS domain and project the results into THAT domain" seems very, very handy - especially because CQL doesn't have a notion of joins. Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Rutger V. <rut...@gm...> - 2009-07-08 04:09:37
|
Sorry, the PhyloWS URL of the search example is (at present): http://8ball.sdsc.edu:6666/treebase-web/phylows/study/find?query=dcterms.identifier=S2484&format=rss1&recordSchema=tree On Tue, Jul 7, 2009 at 5:17 PM, Rutger Vos<rut...@gm...> wrote: > Hi Hilmar, all, > > thanks for your comments! > >>> I notice that this departs a bit from the phylows that is proposed here. >>> For example, the proposed phylows puts "/find/" before "/tree/", whereas >>> you have it the other way. >> >> Right, this is not in compliance with the spec. find/ comes first as it >> changes the resource from a record and its URI to a finder. > > Right, switching that around is fairly trivial, so I'll do that. > >> Also, find/taxon/ would imply that you are finding (and returning) taxa, >> which if I understand correctly is not the case - rather it seems you have >> one query parameter in the URI path (namely that you are searching by >> taxon?) and one in the query string. So if this is searching trees, it needs >> to be find/tree/, and if you are matching against taxon names, the query >> parameter needs to be tb.taxon.name or whatever the blessed metadata term >> for this purpose is. >> >> Third, recordSchema=tree means that you want records back in the tree >> schema. Unless you have invented that schema meanwhile, this is in all >> likelihood not what you want. Rather, the value should be nexml I suppose. >> find/tree already implies that you are finding (and returning) trees, so >> there is no point in expressing that redundantly in the query string. You >> might want to specify that you only want the tree and not also the matrix, >> but that would be a separate query parameter and should not be confounded >> with the return format. > > Mmmmm... I think this warrants a little more discussion. It's probably > true that for most implementors their searches can be conveniently > decomposed into several domains (tree search/matrix search/taxon > search/etc.) and that for each domain the metaphor is that of > searching a single table where the CQL indices are that table's > columns. > > Then, within each domain there is a limited number of concerns: how to > search on the provided indices and how to format the results. For > example, for a search like > http://8ball.sdsc.edu:6666/treebase-web/search/studySearch.html?query=dcterms.identifier=S2484&format=rss1&recordSchema=tree > the implementation is thus: > > * there is a self-contained study searcher > * the searcher knows how predicates map onto columns in the study > table (e.g. dcterms.identifier is the same as study.id) > * the searcher knows how to unpack a study object and get the trees out > > if instead we'd have phylows/tree/find?query=study.identifier=S2484, > the implementation would be something like: > > * there is a tree searcher > * the tree searcher needs to know not just about the tree table but > also about how all other predicates map onto all other tables, and how > they join with the tree table > * the tree searcher needs to know how to traverse study objects and > where trees are inside the study object > * (and similar overlap of concerns becomes necessary if we want the > trees for a given matrix, or for a taxon, or what have you) > > To me that seems like bad design. We'll lose any separation of concern > and might end up with a lot of redundancy between searchers - and a > lot more code (and bugs) to write. I realize that I'm overloading the > "recordSchema" token (and should fix that) but some way of saying > "search THIS domain and project the results into THAT domain" seems > very, very handy - especially because CQL doesn't have a notion of > joins. > > Rutger > > -- > Dr. Rutger A. Vos > Department of zoology > University of British Columbia > http://www.nexml.org > http://rutgervos.blogspot.com > -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |
From: Hilmar L. <hl...@ne...> - 2009-07-08 07:47:32
|
On Jul 7, 2009, at 8:17 PM, Rutger Vos wrote: > I realize that I'm overloading the "recordSchema" token (and should > fix that) That was my main point in this regard. > but some way of saying "search THIS domain and project the results > into THAT domain" seems very, very handy - especially because CQL > doesn't have a notion of > joins. I fully agree. We may just be talking past each other, but I'm not seeing why something like phylows/tree/find?query=study.identifier=S2484 doesn't achieve exactly that - it says search the study domain and project the results into the tree domain. Conversely, phylows/study/find?query=tree.identifier=TB2484 says to search in the tree domain and project the results into the study domain (i.e., return studies that have a tree matching the query). You aren't trying to suggest that dcterms.title or dcterms.identifier should mean different things for different finders, right? I get the sense that you are tying URL patterns and implementations closely together; i.e., phylows/tree/find executes one and the same chunk of code no matter what the query is, and so there would be chunk of code sitting under phylows/study/find that finds trees and another, separate, chunk of code sitting under phylows/tree/find that finds trees. But of course the URL patterns and the code they execute (if any - it may just be indexed files and XSLTs) are two completely separate things. There is no reason that phylows/study/find and phylows/tree/find couldn't (in fact shouldn't) use the exact same tree finder class for finding trees. I think we really need to look at the PhyloWS as a standardized pattern of web-service URLs that are completely decoupled from the underlying implementation which can take a multitude of shapes. What we should pay attention to though is that the API *allows* optimizing of code reuse and clean design of implementations. Are you saying that it stands in the way of that, and if so, how does it prevent clean design of implementations? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== |