From: Junmin L. <ju...@pc...> - 2007-09-14 16:36:23
|
Hi, Dave, Again in line: >> The consensus not to load CEL files into the database - is it because we only >> query for probe set data based on the gene, but not for probe cell data? If I > > yes typically people query the summarized results at the probe set > level. Generally speaking, schema design and data management have to be in the context of contract or any requirements you are obligated to. Ask the question what is the next if you load CEL? or what is the next if you load array data and etc? GUS and its app stacks certainly will allow you do those things, but it is critical you have some judgement calls. And the cost of loading raw data then querying them out is pretty expensive. > There are multiple choices for where to store array annotation at the > moment. > 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence have been > added to more quickly annotate Affy data with Entrez Genes and RefSeq info > respectively. > 2. Another possibility is to use the external_database_release_id and > source_id pair in RAD.ShortOligoFamily to point to one preferred > annotation for each probe set (but you would have to choose one). > 3. Another, less structured possibility, is to use > RAD.CompositeElementAnnotation, where you use the attribute 'name' to > denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the > attribute 'value' for the annotation (e.g. entrez gene id, or refseq id, > etc.) itself. This has less structured but it will allow you to load as > many annotations as you like. I normally favor the consistant data management policy, that means, you don't need documentation somewhere saying "case 1, load data into table a, b, c; case 2, load data into table d, e, f; case 3, load data into table g, h, i", which not only make you data loading tough, also will make you app code built on top db stink. We didn't manage our own db perfectly neither. But hopefully our experiences could prove useful to you. I strongly suggest you look at the MAGE-Tab spec for raw/processed data and ADF spec for array data on ArrayExpress site, for MAGE-Tab and ADF are proved to be very effective for large db like AE. If you can make your app/db align to the standards as we are trying to do also, it certainly give you a safe edge. ---junmin |