|
From: Hobern, D. <DH...@gb...> - 2002-10-29 20:26:21
|
As Anton has mentioned via e-mail we currently expect that the full data model for collection data will be some version of the ABCD schema, which is heavily nested, with multiple optional sections (including strict choices) and with repeating subelements. These characteristics do not seem easy to fit in the current DiGIR substitutable element search model. In Brazil we therefore acknowledged that there does not necessarily have to be a single common schema for both query and results. The suggestion from the meeting was that the query schema should be in effect Darwin Core 3, and that the result schema should be the ABCD model. In fact the current DiGIR model uses the federation schema for three logically separable purposes: as a foundation from which to construct the query, as a logical schema to which physical database schemas are mapped, and as the record structure for results. Personally I remain somewhat uneasy with having different schemas for the different purposes since I would like it to be possible for software to validate all of our data mappings and I am not sure that we could easily do it between two or more arbitrary schemas. In effect the implementor of each provider software application would be responsible for interpreting the relationships between the different schemas. In practice my qualms are probably just a matter of philosophy but they remain. It also seems to me that separating the schemas ignores one of the major benefits which may follow from the DiGIR protocol (although I acknowledge that it is outside the concern of DiGIR as a query and transfer protocol). That benefit would come from having standard methods for a software tool (such as the PHP Provider) to automate the mapping between a query, a physical data model and a result structure. If all of these relate to the same schema, the process may be tricky but it is logically just a matter of programming. If multiple schemas are used for the different aspects, at the very least need extra definitions would be required to define how these mappings are to be performed. This seems unnecessarily complex. (I believe that this issue is less important for BioCASE, since there the protocol will be used simply for transfer and there is therefore no need for DiGIR to be able to define the other mappings internal to the provider.) My hope (based on the combination of the DiGIR meetings and the ABCD sessions) is that any future version of Darwin Core will in fact model documents which are valid under the ABCD Schema (a small subset of the total set of fields, probably corresponding to what most people will in fact be able to populate within the schema). I think that this should be possible and will in fact allow us in practice to maintain a single schema without any philosophical issues about how to perform correct mappings between the schemas. We could define certain characteristics of schema elements which make them unavailable as query elements (e.g. some combination of depth in document, optionality and cardinality). The problem is that even Darwin Core will include repeating subelements (e.g. for collector or higher taxon). This will mean that we need to understand how to model these within DiGIR queries. The bottom line here is that I think our level of agreement in Brazil was in part because BioCASE needed to be able to separate the query and result schemas whereas I assumed that we would ultimately be able to combine both as different subsets of the same schema. We ignored larger questions about what the DiGIR protocol may or may not be able to achieve because the specific case which interests us first can probably work without answering them. Speaking both from my role in GBIF and as a software architect I would like to see DiGIR become a common transport and query protocol for use in any subject domain in which we want to exchange data. For that reason I would like to resolve these more abstract issues before we go too far. Donald --------------------------------------------------------------- Donald Hobern Programme Officer for Data Access and Database Interoperability Global Biodiversity Information Facility Secretariat Zoological Museum - University of Copenhagen Universitetsparken 15, DK-2100 Copenhagen, Denmark Tel: +45-35321483 Fax: +45-35321480 E-mail: dh...@gb... --------------------------------------------------------------- |