From: Rutger V. <rut...@gm...> - 2009-07-08 05:43:00
|
Hi Hilmar, all, thanks for your comments! >> I notice that this departs a bit from the phylows that is proposed here. >> For example, the proposed phylows puts "/find/" before "/tree/", whereas >> you have it the other way. > > Right, this is not in compliance with the spec. find/ comes first as it > changes the resource from a record and its URI to a finder. Right, switching that around is fairly trivial, so I'll do that. > Also, find/taxon/ would imply that you are finding (and returning) taxa, > which if I understand correctly is not the case - rather it seems you have > one query parameter in the URI path (namely that you are searching by > taxon?) and one in the query string. So if this is searching trees, it needs > to be find/tree/, and if you are matching against taxon names, the query > parameter needs to be tb.taxon.name or whatever the blessed metadata term > for this purpose is. > > Third, recordSchema=tree means that you want records back in the tree > schema. Unless you have invented that schema meanwhile, this is in all > likelihood not what you want. Rather, the value should be nexml I suppose. > find/tree already implies that you are finding (and returning) trees, so > there is no point in expressing that redundantly in the query string. You > might want to specify that you only want the tree and not also the matrix, > but that would be a separate query parameter and should not be confounded > with the return format. Mmmmm... I think this warrants a little more discussion. It's probably true that for most implementors their searches can be conveniently decomposed into several domains (tree search/matrix search/taxon search/etc.) and that for each domain the metaphor is that of searching a single table where the CQL indices are that table's columns. Then, within each domain there is a limited number of concerns: how to search on the provided indices and how to format the results. For example, for a search like http://8ball.sdsc.edu:6666/treebase-web/search/studySearch.html?query=dcterms.identifier=S2484&format=rss1&recordSchema=tree the implementation is thus: * there is a self-contained study searcher * the searcher knows how predicates map onto columns in the study table (e.g. dcterms.identifier is the same as study.id) * the searcher knows how to unpack a study object and get the trees out if instead we'd have phylows/tree/find?query=study.identifier=S2484, the implementation would be something like: * there is a tree searcher * the tree searcher needs to know not just about the tree table but also about how all other predicates map onto all other tables, and how they join with the tree table * the tree searcher needs to know how to traverse study objects and where trees are inside the study object * (and similar overlap of concerns becomes necessary if we want the trees for a given matrix, or for a taxon, or what have you) To me that seems like bad design. We'll lose any separation of concern and might end up with a lot of redundancy between searchers - and a lot more code (and bugs) to write. I realize that I'm overloading the "recordSchema" token (and should fix that) but some way of saying "search THIS domain and project the results into THAT domain" seems very, very handy - especially because CQL doesn't have a notion of joins. Rutger -- Dr. Rutger A. Vos Department of zoology University of British Columbia http://www.nexml.org http://rutgervos.blogspot.com |