Re: [GUSDEV] Affymetrix .CEL files

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Junmin and Elisabetta, thanks again for your helpful comments.

Couple of questions.

1.  The HG-U133_Plus_2 array annotation file I downloaded from 
Affymetrix is an xml file in MAGE-ML format.  On the RAD download page ( 
http://www.cbil.upenn.edu/downloads/RAD/ ), I see a tool called 
mage2tab-v0.9, which I assume would be able to convert the annotation 
file to MAGE-TAB format.  Then in order to load this MAGE-TAB file into 
GUS, I noticed on the CBIL Lab Meetings web page, for Thursday March 15, 
2007, Junmin gave a talk on MR-Ti, and the description mentions the 
loadMageDoc GUS plugin.  I notice (and have downloaded) a file on the 
RAD download page called "MR_T_ForGUS35.tar.gz" but the loadMageDoc 
plugin is not in there.  Is there a way for me to obtain this plugin?

2.  I ran "apt-probeset-summarize" in the Affymetrix Power Tools (APT) 
package ( 
http://www.affymetrix.com/support/developer/powertools/index.affx ) and 
obtained probe set data for my .CEL files, one set for RMA and another 
set for PLIER.  Is there a plugin that will readily load these APT 
output files into GUS as probe set data?

3.  The GUS installation I'm using is top of trunk from the CBIL svn 
repository.  This is because I'm using postgresql on the back end, and 
the 3.5 GUS package gave me a lot of problems.  These seem to have been 
fixed in the top of trunk.  However, in order to use existing plugins, 
would it be advisable to use top of trunk (including the new schema 
changes for new features  that Elisabetta mentioned)?  If not, is there, 
or do you plan on releasing a bug-fix version of 3.5 that contains bug 
fixes back-ported to 3.5, but does not contain any of the new features 
not yet released?

4.  Is there any way in RAD or GUS to load pathological images (e.g. 
associated with biosamples used for hybridization) into the GUS database?

Thanks very much,
Dave

Junmin Liu wrote:
> Hi, Dave,
> Again in line:
>
>>> The consensus not to load CEL files into the database - is it 
>>> because we only
>>> query for probe set data based on the gene, but not for probe cell 
>>> data? If I
>>
>> yes typically people query the summarized results at the probe set
>> level.
>
> Generally speaking, schema design and data management have to be in 
> the context of contract or any requirements you are obligated to.
>
> Ask the question what is the next if you load CEL? or what is the next 
> if you load array data and etc?
>
> GUS and its app stacks certainly will allow you do those things, but 
> it is critical you have some judgement calls. And the cost of loading 
> raw data then querying them out is pretty expensive.
>
>> There are multiple choices for where to store array annotation at the
>> moment.
>> 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence have 
>> been
>> added to more quickly annotate Affy data with Entrez Genes and RefSeq 
>> info
>> respectively.
>> 2. Another possibility is to use the external_database_release_id and
>> source_id pair in RAD.ShortOligoFamily to point to one preferred
>> annotation for each probe set (but you would have to choose one).
>> 3. Another, less structured possibility, is to use
>> RAD.CompositeElementAnnotation, where you use the attribute 'name' to
>> denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the
>> attribute 'value' for the annotation (e.g. entrez gene id, or refseq id,
>> etc.) itself. This has less structured but it will allow you to load as
>> many annotations as you like.
>
> I normally favor the consistant data management policy, that means, 
> you don't need documentation somewhere saying "case 1, load data into 
> table a, b, c; case 2, load data into table d, e, f; case 3, load data 
> into table g, h, i", which not only make you data loading tough, also 
> will make you app code built on top db stink.
>
> We didn't manage our own db perfectly neither. But hopefully our 
> experiences could prove useful to you.
>
> I strongly suggest you look at the MAGE-Tab spec for raw/processed 
> data and ADF spec for array data on ArrayExpress site, for MAGE-Tab 
> and ADF are proved to be very effective for large db like AE. If you 
> can make your app/db align to the standards as we are trying to do 
> also, it certainly give you a safe edge.
>
> ---junmin
>