From: Angel P. <an...@pc...> - 2003-06-02 18:57:24
|
Paul, Since I referred you her, I'll take point and answer these questions. See the comments below. WE ask if you take/adapt any code, that you pay attention to the Apache inspired license and add references back to gusdb.org. Thanks! Angel Paul Boutros wrote: >Hi all, > >Hopefully this is the right place for these questions. If not, please let me >know where I can ask. > >I have a moderately extensive Oracle DB for cDNA microarray data. As >requirements have increased it is now considered desirable to store Affy data >as well as enhanced sample Annotation. I looked into MAGE-ML, and was referred >from a list there to GUS DB. > >I've started implementing a portion of your schema into my own DB, in >particular the annotation portion (e.g. ExternalDatabaseRelease, BioMaterialImp >& associated views, LabelMethod, etc.). > >I'm thinking about using an even larger fraction -- including the >CompositeElementImp/CompositeElementResultImp and ElementImp/ElementResultImp >tables -- because I really like the schema design you've done. > >But here is my problem. I'm having some difficulty interpreting the meanings >of those tables. My main questions: >1. What are the differences between the xxxElementImp and xxxElementResultImp >tables? What goes into each? My understanding is that the xxxElementImp store >details about the array *layout* while the xxxElementResultImp store details >about the data from specific arrays. > This is correct. For example the Array table would contain "Affy array U74A", the ShortOligo view on the ElementImp table would contain the probe pair information and the ShortOligoFamily view on CompositeElementImp would contain information of the probe sets. For microarray data, Array = "MicroArray X" and the Spot view on ElementImp would contain the Features (e.g. physical locations) and the sequence that is spotted there. For MAGE purposes, we decided to put Reporter and CompositeSequence information in the SpotFamily view on the CompositeElementImp table. The Results go into the various views on xxxElemenResultImp, such as ArrayVisionResult or AffymetrixMAS4. Check out the documentation on the ArrayVision view from the GUS schema browser (look for the tables with the RAD prefix): http://www.cbil.upenn.edu/cgi-bin/GUS30/schemaBrowser.pl?db=GUS30 http://www.cbil.upenn.edu/cgi-bin/GUS30/schemaBrowser.pl?db=GUS30&table=RAD3::ArrayVisionElementResult&path=RAD3::ArrayVisionElementResult >2. If the above description is right, would that mean that for each physical >array (each "chip") there are records in all four tables? Is that necessary >for cases with repeated chip "layouts"? > Well, it depends on what data you have and what you are going to use this DB for. But let me first state that this is actually the most space efficient way of storing array layouts and results. We separated the array layout information (as you noted) from the results in order to use the layouts repeatedly for multiple analysis on the same chips. So to answer your question: If your intent to provide a DB to keep track of LIMS information for something like a microarray core facility, (e.g. you are never going to work with the data from within the database) then you do not need the xxxElementResultImp tables at all. You can just store the Array definitions and the Hybridization information on the Assay -> Acquisition -> Quantification tables: Assay = Hybridization , Acquisition = Scanning information and the location of the image file, Quantification = Feature extraction / quantification software parameters and the location of the result file But for our purposes let's assume that you need to store the data in the DB and work with it there: For Affy, if you only produce / receive MAS* files, then you do not need the Element*Imp branch, since you will not need to store the individual probe pairs or the CEL file results on these probe pairs For microarray data, if you do not want to group elements into some bigger concept, like a gene, or group the individual elements by source plate information, then you do not need the CompositeElement*Imp branch. All other cases require you to fill in all four tables. >3. Are the Ontologies used in RAD3::OntologyTerm publicly available? I >couldn't find them in the 3.0-Beta release tar, but perhaps I just missed them? > > No they are not, for a variety of reasons. You raise a good point though and we will put this on our to-do list. >Any help or suggested reading would be very much appreciated! >Paul > Here are two references for the previous version of the schema that cover the major concepts/conventions used in RAD. Most of it still apply, module some schema details. If you can't get these, email me (off the list) and I'll try and get copies sent. A new manuscript is in preparation. Stoeckert, C., Pizarro, A., Manduchi, E., Gibson, M., Brunk, B., Crabtree, J., Schug, S., Shen-Orr, S., Overton, G.C. (2001) A relational schema for both array-based and SAGE gene expression experiments. Bioinformatics 17(4), 300-308 (2001). Manduchi, E., Pizarro, A., Stoeckert, C. (2001) RAD (RNA Abundance Database): an infrastructure for array data analysis. Proc. SPIE, vol 4266, pp. 68-78. Angel |