From: Paul B. <pcb...@en...> - 2003-06-02 18:10:52
|
Hi all, Hopefully this is the right place for these questions. If not, please let me know where I can ask. I have a moderately extensive Oracle DB for cDNA microarray data. As requirements have increased it is now considered desirable to store Affy data as well as enhanced sample Annotation. I looked into MAGE-ML, and was referred from a list there to GUS DB. I've started implementing a portion of your schema into my own DB, in particular the annotation portion (e.g. ExternalDatabaseRelease, BioMaterialImp & associated views, LabelMethod, etc.). I'm thinking about using an even larger fraction -- including the CompositeElementImp/CompositeElementResultImp and ElementImp/ElementResultImp tables -- because I really like the schema design you've done. But here is my problem. I'm having some difficulty interpreting the meanings of those tables. My main questions: 1. What are the differences between the xxxElementImp and xxxElementResultImp tables? What goes into each? My understanding is that the xxxElementImp store details about the array *layout* while the xxxElementResultImp store details about the data from specific arrays. 2. If the above description is right, would that mean that for each physical array (each "chip") there are records in all four tables? Is that necessary for cases with repeated chip "layouts"? 3. Are the Ontologies used in RAD3::OntologyTerm publicly available? I couldn't find them in the 3.0-Beta release tar, but perhaps I just missed them? Any help or suggested reading would be very much appreciated! Paul ---------------------------------------- This mail sent through www.mywaterloo.ca |
From: Elisabetta M. <man...@pc...> - 2003-06-02 18:39:52
|
Hi Paul, thanks for your interest in RAD. > 1. What are the differences between the xxxElementImp and xxxElementResultImp > tables? What goes into each? My understanding is that the xxxElementImp store > details about the array *layout* while the xxxElementResultImp store details > about the data from specific arrays. > 2. If the above description is right, would that mean that for each physical array (each "chip") there are records in all four tables? Is that > necessary for cases with repeated chip "layouts"? The xxxElementImp tables indeed store information about arrays, that is what is spotted/synthesized at each location on the array. The xxxElementResultImp tables store the measurements output by image quantifications software for a given quantification. Thus, for example, for a given array layout, you would store this once and for all using the xxxElementImp tables. Then, for each image quantification of a hybridization done using such a chip, you would populate the xxxElementResultImp tables with the resulting data. > 3. Are the Ontologies used in RAD3::OntologyTerm publicly available? I > couldn't find them in the 3.0-Beta release tar, but perhaps I just missed them? I guess you are referring to the RAD3.OntologyEntry table. The entries we store in this table typically point to entries stored in ontology tables which are in the SRes component of GUS. This includes the MGED Ontology which is available at http://mged.sourceforge.net/ontologies/index.php. Elisabetta |
From: Angel P. <an...@pc...> - 2003-06-02 18:57:24
|
Paul, Since I referred you her, I'll take point and answer these questions. See the comments below. WE ask if you take/adapt any code, that you pay attention to the Apache inspired license and add references back to gusdb.org. Thanks! Angel Paul Boutros wrote: >Hi all, > >Hopefully this is the right place for these questions. If not, please let me >know where I can ask. > >I have a moderately extensive Oracle DB for cDNA microarray data. As >requirements have increased it is now considered desirable to store Affy data >as well as enhanced sample Annotation. I looked into MAGE-ML, and was referred >from a list there to GUS DB. > >I've started implementing a portion of your schema into my own DB, in >particular the annotation portion (e.g. ExternalDatabaseRelease, BioMaterialImp >& associated views, LabelMethod, etc.). > >I'm thinking about using an even larger fraction -- including the >CompositeElementImp/CompositeElementResultImp and ElementImp/ElementResultImp >tables -- because I really like the schema design you've done. > >But here is my problem. I'm having some difficulty interpreting the meanings >of those tables. My main questions: >1. What are the differences between the xxxElementImp and xxxElementResultImp >tables? What goes into each? My understanding is that the xxxElementImp store >details about the array *layout* while the xxxElementResultImp store details >about the data from specific arrays. > This is correct. For example the Array table would contain "Affy array U74A", the ShortOligo view on the ElementImp table would contain the probe pair information and the ShortOligoFamily view on CompositeElementImp would contain information of the probe sets. For microarray data, Array = "MicroArray X" and the Spot view on ElementImp would contain the Features (e.g. physical locations) and the sequence that is spotted there. For MAGE purposes, we decided to put Reporter and CompositeSequence information in the SpotFamily view on the CompositeElementImp table. The Results go into the various views on xxxElemenResultImp, such as ArrayVisionResult or AffymetrixMAS4. Check out the documentation on the ArrayVision view from the GUS schema browser (look for the tables with the RAD prefix): http://www.cbil.upenn.edu/cgi-bin/GUS30/schemaBrowser.pl?db=GUS30 http://www.cbil.upenn.edu/cgi-bin/GUS30/schemaBrowser.pl?db=GUS30&table=RAD3::ArrayVisionElementResult&path=RAD3::ArrayVisionElementResult >2. If the above description is right, would that mean that for each physical >array (each "chip") there are records in all four tables? Is that necessary >for cases with repeated chip "layouts"? > Well, it depends on what data you have and what you are going to use this DB for. But let me first state that this is actually the most space efficient way of storing array layouts and results. We separated the array layout information (as you noted) from the results in order to use the layouts repeatedly for multiple analysis on the same chips. So to answer your question: If your intent to provide a DB to keep track of LIMS information for something like a microarray core facility, (e.g. you are never going to work with the data from within the database) then you do not need the xxxElementResultImp tables at all. You can just store the Array definitions and the Hybridization information on the Assay -> Acquisition -> Quantification tables: Assay = Hybridization , Acquisition = Scanning information and the location of the image file, Quantification = Feature extraction / quantification software parameters and the location of the result file But for our purposes let's assume that you need to store the data in the DB and work with it there: For Affy, if you only produce / receive MAS* files, then you do not need the Element*Imp branch, since you will not need to store the individual probe pairs or the CEL file results on these probe pairs For microarray data, if you do not want to group elements into some bigger concept, like a gene, or group the individual elements by source plate information, then you do not need the CompositeElement*Imp branch. All other cases require you to fill in all four tables. >3. Are the Ontologies used in RAD3::OntologyTerm publicly available? I >couldn't find them in the 3.0-Beta release tar, but perhaps I just missed them? > > No they are not, for a variety of reasons. You raise a good point though and we will put this on our to-do list. >Any help or suggested reading would be very much appreciated! >Paul > Here are two references for the previous version of the schema that cover the major concepts/conventions used in RAD. Most of it still apply, module some schema details. If you can't get these, email me (off the list) and I'll try and get copies sent. A new manuscript is in preparation. Stoeckert, C., Pizarro, A., Manduchi, E., Gibson, M., Brunk, B., Crabtree, J., Schug, S., Shen-Orr, S., Overton, G.C. (2001) A relational schema for both array-based and SAGE gene expression experiments. Bioinformatics 17(4), 300-308 (2001). Manduchi, E., Pizarro, A., Stoeckert, C. (2001) RAD (RNA Abundance Database): an infrastructure for array data analysis. Proc. SPIE, vol 4266, pp. 68-78. Angel |
From: Paul B. <pcb...@en...> - 2003-06-02 23:33:15
|
Hi Elisabetta, Angel. Thanks for your speedy and comprehensive replies -- it's very much appreciated! I had asked about the differences between the xxxElementImp and xxxElementResultImp tables, and that makes sense now. I had been confused by the presence of ARRAY_ID on both tables, but I guess this is primarily just a denormalization then. So, if I understand correctly, the querying process to go from a BioMaterial to getting the corresponding Elements and ElementResults would look something like this: SELECT ElementImp.tinystring1, ElementResultImp.foreground FROM BioMaterialImp INNER JOIN AssayBioMaterial ON BioMaterialImp.bio_material_id = AssayBioMaterial.bio_material_id INNER JOIN Acquisition ON AssayBioMaterial.assay_id = Acquisition.assay_id INNER JOIN Quantificiation ON Acquisition.acquisition_id = Quantification.acquisition_id INNER JOIN ElementResultImp ON Quantification.quantification_id = ElementResultImp.quantification_id INNER JOIN ElementImp ON ElementResultImp.element_id = ElementImp.element_id; In other words, I walk the table chain: BioMaterialImp -> AssayBioMaterial -> Acquisition -> Quantification -> ElementResultImp -> ElementImp This is data for one channel of a multi-channel array. The quantification and acquisition relationships are stored in the RelatedXXX tables. My understanding is that to get access to data from both channels for a specific array (e.g. to get ratios) I would need to do something like: SELECT ElementImp.tinystring1 DECODE(BioMaterialID.label_method_id, 1, SUM (ElementResultImp.foreground), NULL), DECODE(BioMaterialID.label_method_id, 2, SUM (ElementResultImp.foreground), NULL) FROM BioMaterialImp INNER JOIN AssayBioMaterial ON BioMaterialImp.bio_material_id = AssayBioMaterial.bio_material_id INNER JOIN Acquisition ON AssayBioMaterial.assay_id = Acquisition.assay_id INNER JOIN Quantification ON Acquisition.acquisition_id = Quantification.acquisition_id INNER JOIN ElementResultImp ON Quantification.quantification_id = ElementResultImp.quantification_id INNER JOIN ElementImp ON ElementResultImp.element_id = ElementImp.element_id WHERE Assay.array_identifier = ? Where I'm taking: label_method_id = 1 for a Cy3 channel label_method_id = 2 for a Cy5 channel And then if I calculate a ratio or a normalized-value I would create a new ElementResultImp view and store the values in there. Is that about right? Thanks for your patience with me, Paul ---------------------------------------- This mail sent through www.mywaterloo.ca |
From: Elisabetta M. <man...@pc...> - 2003-06-03 02:29:06
|
Hi Paul, > I had asked about the differences between the xxxElementImp and > xxxElementResultImp tables, and that makes sense now. I had been > confused by the presence of ARRAY_ID on both tables, but I guess this is > primarily just a denormalization then. actually array_id is only an attribute in xxxElementImp. xxxElementResultImp has a quantification_id attribute, since the array_id can be inferred either by going to ElementImp or by going to Assay through Quantification->Acquisition->Assay. > So, if I understand correctly, the querying process to go from a > BioMaterial to getting the corresponding Elements and ElementResults > would look something like this: > ... > In other words, I walk the table chain: > BioMaterialImp -> AssayBioMaterial -> > Acquisition -> Quantification -> > ElementResultImp -> ElementImp Yes, you have it right. As for ElementResultImp, we typically have a view for each image quantification software (e.g. ArrayVisionElementResultImp, GenePixElementResultImp, etc.) and we use them to store ALL measurements output by that software package, EXCEPT for those that can be derived from other measurements already stored. Now, in the case of 2-channel data, we have 2 quantification_ids, one per channel, for each image analysis run. So, for each element, we store 2 complete groups of measurements in the corresponding ElementResultImp view, one for each of the two quantification_id. For those measurements, which are not derived, but which are typically identical for the two channels, e.g. the GenePix spot_diameter, or the mean_of_ratios, they will be stored redundantly, once for each channel. (Note that the GenePix ratio_of_means is a derived measurement so we don't store it in GenePixElementResult, whereas the means_of_ratios is not. The schema browser http://www.cbil.upenn.edu/cgi-bin/GUS30/schemaBrowser.pl?db=GUS30 contains more detailed doc on the RAD tables and their attributes.) Given 2 related quantifications (as from the RelatedQuantification table) for a given 2-channel assay, you can retrieve the ElementResultImp measurements for each, and then the element_id will tell you which correspond to the same element. From this you can compute ratios for each element (e.g. ratios of background subtracted intensities). > And then if I calculate a ratio or a normalized-value I would create a > new ElementResultImp view and store the values in there. Is that about > right? We do not create new views of ElementResultImp for derived or normalized values. The views of ElementResultImp are solely to store "raw" output from image quantification software packages. We use the Processing tables (ProcessedIO, ProcessResult, ProcessInvocation, etc.) to store results of preprocessing. These tables are flexible enough to be used to store a variety of processings, including averaging within or across arrays, normalization results, etc. Hope this helps, Elisabetta P.S. I'll be leaving tomorrow afternoon for meetings in Europe and I might not have easy access to my email while away (till June 15). In any case, should you have more questions, Angel or any other of the "Raddies" (who is on the this mailing list) should be able to help you. --- On Mon, 2 Jun 2003, Paul Boutros wrote: > Hi Elisabetta, Angel. > > Thanks for your speedy and comprehensive replies -- it's very much appreciated! > > I had asked about the differences between the xxxElementImp and > xxxElementResultImp tables, and that makes sense now. I had been confused by > the presence of ARRAY_ID on both tables, but I guess this is primarily just a > denormalization then. > > So, if I understand correctly, the querying process to go from a BioMaterial to > getting the corresponding Elements and ElementResults would look something like > this: > > SELECT ElementImp.tinystring1, > ElementResultImp.foreground > FROM BioMaterialImp > INNER JOIN AssayBioMaterial > ON BioMaterialImp.bio_material_id = > AssayBioMaterial.bio_material_id > INNER JOIN Acquisition > ON AssayBioMaterial.assay_id = Acquisition.assay_id > INNER JOIN Quantificiation > ON Acquisition.acquisition_id = Quantification.acquisition_id > INNER JOIN ElementResultImp > ON Quantification.quantification_id = > ElementResultImp.quantification_id > INNER JOIN ElementImp > ON ElementResultImp.element_id = ElementImp.element_id; > > In other words, I walk the table chain: > BioMaterialImp -> AssayBioMaterial -> > Acquisition -> Quantification -> > ElementResultImp -> ElementImp > > This is data for one channel of a multi-channel array. The quantification and > acquisition relationships are stored in the RelatedXXX tables. My > understanding is that to get access to data from both channels for a specific > array (e.g. to get ratios) I would need to do something like: > > SELECT ElementImp.tinystring1 > DECODE(BioMaterialID.label_method_id, 1, SUM > (ElementResultImp.foreground), NULL), > DECODE(BioMaterialID.label_method_id, 2, SUM > (ElementResultImp.foreground), NULL) > FROM BioMaterialImp > INNER JOIN AssayBioMaterial > ON BioMaterialImp.bio_material_id = > AssayBioMaterial.bio_material_id > INNER JOIN Acquisition > ON AssayBioMaterial.assay_id = Acquisition.assay_id > INNER JOIN Quantification > ON Acquisition.acquisition_id = Quantification.acquisition_id > INNER JOIN ElementResultImp > ON Quantification.quantification_id = > ElementResultImp.quantification_id > INNER JOIN ElementImp > ON ElementResultImp.element_id = ElementImp.element_id > WHERE Assay.array_identifier = ? > > Where I'm taking: > label_method_id = 1 for a Cy3 channel > label_method_id = 2 for a Cy5 channel > > And then if I calculate a ratio or a normalized-value I would create a new > ElementResultImp view and store the values in there. Is that about right? > > Thanks for your patience with me, > Paul > > ---------------------------------------- > This mail sent through www.mywaterloo.ca > > > ------------------------------------------------------- > This SF.net email is sponsored by: eBay > Get office equipment for less on eBay! > http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev |