From: Steve F. <st...@pc...> - 2003-08-08 13:06:17
|
Terry- In general, schema transformation is a hard problem. It looks like, in the immediate case, the transform from the agave sequence file to gus is relatively simple: a one-to-one transform of a single data type, but requiring mapping some foreign keys. It seems that Bourret's XML-DBMS can handle this gracefully. What i am not clear on is: - would this be able to work for the more common, non-trivial case where the object models differ more significantly? Angel has done a lot of thinking about this for his object mapper which maps MAGE-ML into GUS. - will you go directly to SQL or will you go to gus objects? In a simple case like this, it might make sense to go to SQL for efficiency reasons, if you are expecting huge inputs. - is the power offered by Bourret's transformer sufficient to justify us packaging his product with the GUS distribution, which is what we would probably have to do if we write certified plugins that use it. - since the transform in the case you are handling is so simple, does it make sense to deploy a third-party transformer rather than just write some simple brute force perl code - what other third-party software (eg, xml parsers) does bourret depend on? In sum, i think it is an excellent idea to consider third party solutions, and to provide general solutions instead of one-off plugins. I also want to make sure that the extra effort we put into making this a general solution will have a comensurate payoff. steve Terry Clark wrote: >Posted for discussion at > http://flora.uchicago.edu/ >is a short specification sketched (working draft) >for an XML-to-GUS Plugin for GUS' NASequenceImp, LoadSeqFromXML.pm. > >The idea, based on the XML to DB/SQL work by Ron Bourret, >is to give a flexible way to map various sequence >data management projects into GUS using the same source. >The Plugin operates from 1) an XML formatted sequence file >and 2) a mapping from the sequence file to GUS space. > >The method is generally applicable to GUS objects/tables, >but the idiosyncracies of objects probably calls for >individual mappers, like LoadSeqFromXML.pm >(better named as LoadNASequenceFromXML.pm). > >Are there any thoughts, variations, existing work, etc., >about the idea? > >Terry > > |