From: Elisabetta M. <man...@pc...> - 2005-04-19 18:34:34
|
Hi Sucheta, the tables named RAD3.Analysis*** can be used to store downstream analyses (and with GUS 3.5 they will also be used to store upstream, i.e. preprocessing analyses, eliminating the Process*** tables). The RAd3.LogicalGroup*** tables can be used to store the input to an analysis (e.g. a group of quantification_ids, etc), this is linked to the analysis itself via RAD3.AnalysisInput. The results are stored in appropriate views for RAD3.AnalysisResultImp. By downstream analyses here we mean analyses like differential_expression, clustering, etc. Currently we have implemented views of AnalysisResultImp aimed to store mostly various kinds of differential_expression analyses, e.g. RAD3.SAM, RAD3.ArrayStatTwoConditions (we also have another view: DataTransformationResult which is devoted to the upcoming storage of pre-processed data). But the Analysis tables are flexible enough to accomodate other kinds of analyses. In our instance so far we have not been storing clustering results, but with appropriate views of AnalysisResultImp this should be doable. Let me first give you an example of how the results of SAM (differential expression) are stored in RAD. Say that you applied SAM to find differentially expressed genes between two conditions: C1 and C2. Say you have 4 assay per conditions and for simplicity that these were AffyMetrixMAS5.0 results: INPUT: You have some flexibility here. One option is to create two logical groups, one representing the 4 quantification_ids for C1 (LogicalGroupLink will identify each member of this group) and one to represent those for C2. (An alternative is to just create one Logical group to represent the group of all 8 quantification_ids for these 8 Affy result sets and then to use LogicalGroupLink.order_num to distinguish C1 and C2.) RESULTS: You would then create entries in AnalysisInput to link all logical groups input into the analysis to the analysis_id for this SAM run. AnalysisParam and AnalysisQCParam can be used to store any analysis parameter (or qc param) setting. Finally in the RAD3.SAM view of AnalysisResultImp you would store the results as illustrated in http://www.gusdb.org/cgi-bin/schemaBrowser?db=CBILBLD&table=RAD3::SAM&path=RAD3::SAM This is to illustrate the flexibility of these tables. For clustering you could devise a similar flow, albeit the detail will have to be worked out. E.g. for k-means clustering: Logical group could be used to store all quantification_ids (or analysis_ids, if the input is processed data) input into the clustering. Then an appropriate view of AnalysisResultImp would use (table_id, row_id) to denote the spot on the array to which the result refers and it could have a field say cluster_number to indicate to which cluster that spot belongs. Or something in this spirit. For hierarchical clustering a slightly more complex view would have to be designed. Hope this helps. Elisabetta P.S. I'm not too clear on the second part of your question: we store normalized data (formerly in the Process*** tables and now in the Analysis tables). Typically for each assay you would have as many results as spots on the chip, so I'm not clear on the 1.3 million per chip. If a chip say has 20,000 spots and you are using the analysis tables (DataTransformationResult), for this chip you would have 20,000 entries in this view plus a few sparse entries to fill in LogicalGroup, AnalysisInput, etc. So far we have had no problems storing our normalized data (even we we used the Process*** tables), but I guess it all depends on how many studies you are storing in your instance of RAD. > > We are currently planning to conduct more than one statistical analysis to a > particular dataset. However, I could not find a place to store them. > Similarly, if we do multiple types of clustering where the data can be > stored. > > We have another concern also about the data size and the query performance. > Our per chip one normalization method adds 1.3 million records, so if we have > several methods, then we will end up having several gigs of record. > > Do you have any issues with this. > > Many thanks > > Sucheta |