Re: [GUSDEV] Affymetrix .CEL files

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Dave,
I'll respond to 2 and 4. For (1) I defer to Junmin.
For (3) all I can say is that it is in our lab's plans to release 
bug-fixes and new releases of GUS, however this keeps being postponed due 
to other priorities. In the meantime for postresql questions re GUS, John 
Iodice might be able to help you.
Getting back to your question (2), first of all, as mentioned in my 
previous email we currently have a view for RMA results, but we do not 
have a view for Plier results. If you need a view for Plier in your 
instance of the DB though, you can simply create such a view with the 
attributes you need in your own instance. It would be a view of 
RAD.CompositeElementResultImp. Once created, remember to update 
Core.TableInfo and rebuild GUS, so that the objects for the new view are 
in place.
The current available plugins to load data into 
RAD.CompositeElementResultImp views are: LoadArrayResult (in Supported) 
which loads the results of one assay at a time, and LoadBatchResult which 
we have already discussed. The documentation of these plugins, available 
from svn illustrates, what the input format should be. The idea guiding 
the design of these plugins we made available was that they would be 
*generic*, i.e. they would be able to take data from a wide variety of 
quantification software and load them into RAD. So we opted for one 
generic code at the expense of some work to put the input into the 
appropriate format.
If a project/lab typically gets files in a particular data format, then it 
might be worth for them to write a plugin which is specific to that rather 
than using the generic plugin. This way they can use the output as spit 
out by the software they use. It is fairly simple to write a plugin 
specific to one's needs using the Plugin package. So if you expect to deal 
most of the timewith a particular type of output (e.g. from APT) you 
might consider writing a specific plugin.

Regarding your question (4), the answer is no. We do not store images in 
GUS. For certain types of images, like microarray images (e.g. files 
resulting from scanning, like .TIF or .DAT) we store in the db their uri 
to the fileserver (in RAD.Acquisition.uri).
Hope this helps,
Elisabetta

---

On Fri, 14 Sep 2007, Dave Hau wrote:

> Junmin and Elisabetta, thanks again for your helpful comments.
>
> Couple of questions.
>
> 1.  The HG-U133_Plus_2 array annotation file I downloaded from Affymetrix is 
> an xml file in MAGE-ML format.  On the RAD download page ( 
> http://www.cbil.upenn.edu/downloads/RAD/ ), I see a tool called 
> mage2tab-v0.9, which I assume would be able to convert the annotation file to 
> MAGE-TAB format.  Then in order to load this MAGE-TAB file into GUS, I 
> noticed on the CBIL Lab Meetings web page, for Thursday March 15, 2007, 
> Junmin gave a talk on MR-Ti, and the description mentions the loadMageDoc GUS 
> plugin.  I notice (and have downloaded) a file on the RAD download page 
> called "MR_T_ForGUS35.tar.gz" but the loadMageDoc plugin is not in there.  Is 
> there a way for me to obtain this plugin?
>
> 2.  I ran "apt-probeset-summarize" in the Affymetrix Power Tools (APT) 
> package ( http://www.affymetrix.com/support/developer/powertools/index.affx ) 
> and obtained probe set data for my .CEL files, one set for RMA and another 
> set for PLIER.  Is there a plugin that will readily load these APT output 
> files into GUS as probe set data?
>
> 3.  The GUS installation I'm using is top of trunk from the CBIL svn 
> repository.  This is because I'm using postgresql on the back end, and the 
> 3.5 GUS package gave me a lot of problems.  These seem to have been fixed in 
> the top of trunk.  However, in order to use existing plugins, would it be 
> advisable to use top of trunk (including the new schema changes for new 
> features  that Elisabetta mentioned)?  If not, is there, or do you plan on 
> releasing a bug-fix version of 3.5 that contains bug fixes back-ported to 
> 3.5, but does not contain any of the new features not yet released?
>
> 4.  Is there any way in RAD or GUS to load pathological images (e.g. 
> associated with biosamples used for hybridization) into the GUS database?
>
> Thanks very much,
> Dave
>
>
>
> Junmin Liu wrote:
>> Hi, Dave,
>> Again in line:
>> 
>>>> The consensus not to load CEL files into the database - is it because we 
>>>> only
>>>> query for probe set data based on the gene, but not for probe cell data? 
>>>> If I
>>> 
>>> yes typically people query the summarized results at the probe set
>>> level.
>> 
>> Generally speaking, schema design and data management have to be in the 
>> context of contract or any requirements you are obligated to.
>> 
>> Ask the question what is the next if you load CEL? or what is the next if 
>> you load array data and etc?
>> 
>> GUS and its app stacks certainly will allow you do those things, but it is 
>> critical you have some judgement calls. And the cost of loading raw data 
>> then querying them out is pretty expensive.
>> 
>>> There are multiple choices for where to store array annotation at the
>>> moment.
>>> 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence have been
>>> added to more quickly annotate Affy data with Entrez Genes and RefSeq info
>>> respectively.
>>> 2. Another possibility is to use the external_database_release_id and
>>> source_id pair in RAD.ShortOligoFamily to point to one preferred
>>> annotation for each probe set (but you would have to choose one).
>>> 3. Another, less structured possibility, is to use
>>> RAD.CompositeElementAnnotation, where you use the attribute 'name' to
>>> denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the
>>> attribute 'value' for the annotation (e.g. entrez gene id, or refseq id,
>>> etc.) itself. This has less structured but it will allow you to load as
>>> many annotations as you like.
>> 
>> I normally favor the consistant data management policy, that means, you 
>> don't need documentation somewhere saying "case 1, load data into table a, 
>> b, c; case 2, load data into table d, e, f; case 3, load data into table g, 
>> h, i", which not only make you data loading tough, also will make you app 
>> code built on top db stink.
>> 
>> We didn't manage our own db perfectly neither. But hopefully our 
>> experiences could prove useful to you.
>> 
>> I strongly suggest you look at the MAGE-Tab spec for raw/processed data and 
>> ADF spec for array data on ArrayExpress site, for MAGE-Tab and ADF are 
>> proved to be very effective for large db like AE. If you can make your 
>> app/db align to the standards as we are trying to do also, it certainly 
>> give you a safe edge.
>> 
>> ---junmin
>> 
>