Re: [GUSDEV] Affymetrix .CEL files

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

My bad...  I just noticed the LoadMageDoc plugin is in the community 
plugin directory.

Thanks Elisabetta for your prompt reply.

- Dave

Elisabetta Manduchi wrote:
>
> Hi Dave,
> I'll respond to 2 and 4. For (1) I defer to Junmin.
> For (3) all I can say is that it is in our lab's plans to release 
> bug-fixes and new releases of GUS, however this keeps being postponed 
> due to other priorities. In the meantime for postresql questions re 
> GUS, John Iodice might be able to help you.
> Getting back to your question (2), first of all, as mentioned in my 
> previous email we currently have a view for RMA results, but we do not 
> have a view for Plier results. If you need a view for Plier in your 
> instance of the DB though, you can simply create such a view with the 
> attributes you need in your own instance. It would be a view of 
> RAD.CompositeElementResultImp. Once created, remember to update 
> Core.TableInfo and rebuild GUS, so that the objects for the new view 
> are in place.
> The current available plugins to load data into 
> RAD.CompositeElementResultImp views are: LoadArrayResult (in 
> Supported) which loads the results of one assay at a time, and 
> LoadBatchResult which we have already discussed. The documentation of 
> these plugins, available from svn illustrates, what the input format 
> should be. The idea guiding the design of these plugins we made 
> available was that they would be *generic*, i.e. they would be able to 
> take data from a wide variety of quantification software and load them 
> into RAD. So we opted for one generic code at the expense of some work 
> to put the input into the appropriate format.
> If a project/lab typically gets files in a particular data format, 
> then it might be worth for them to write a plugin which is specific to 
> that rather than using the generic plugin. This way they can use the 
> output as spit out by the software they use. It is fairly simple to 
> write a plugin specific to one's needs using the Plugin package. So if 
> you expect to deal most of the timewith a particular type of output 
> (e.g. from APT) you might consider writing a specific plugin.
>
> Regarding your question (4), the answer is no. We do not store images 
> in GUS. For certain types of images, like microarray images (e.g. 
> files resulting from scanning, like .TIF or .DAT) we store in the db 
> their uri to the fileserver (in RAD.Acquisition.uri).
> Hope this helps,
> Elisabetta
>
> ---
>
> On Fri, 14 Sep 2007, Dave Hau wrote:
>
>> Junmin and Elisabetta, thanks again for your helpful comments.
>>
>> Couple of questions.
>>
>> 1.  The HG-U133_Plus_2 array annotation file I downloaded from 
>> Affymetrix is an xml file in MAGE-ML format.  On the RAD download 
>> page ( http://www.cbil.upenn.edu/downloads/RAD/ ), I see a tool 
>> called mage2tab-v0.9, which I assume would be able to convert the 
>> annotation file to MAGE-TAB format.  Then in order to load this 
>> MAGE-TAB file into GUS, I noticed on the CBIL Lab Meetings web page, 
>> for Thursday March 15, 2007, Junmin gave a talk on MR-Ti, and the 
>> description mentions the loadMageDoc GUS plugin.  I notice (and have 
>> downloaded) a file on the RAD download page called 
>> "MR_T_ForGUS35.tar.gz" but the loadMageDoc plugin is not in there.  
>> Is there a way for me to obtain this plugin?
>>
>> 2.  I ran "apt-probeset-summarize" in the Affymetrix Power Tools 
>> (APT) package ( 
>> http://www.affymetrix.com/support/developer/powertools/index.affx ) 
>> and obtained probe set data for my .CEL files, one set for RMA and 
>> another set for PLIER.  Is there a plugin that will readily load 
>> these APT output files into GUS as probe set data?
>>
>> 3.  The GUS installation I'm using is top of trunk from the CBIL svn 
>> repository.  This is because I'm using postgresql on the back end, 
>> and the 3.5 GUS package gave me a lot of problems.  These seem to 
>> have been fixed in the top of trunk.  However, in order to use 
>> existing plugins, would it be advisable to use top of trunk 
>> (including the new schema changes for new features  that Elisabetta 
>> mentioned)?  If not, is there, or do you plan on releasing a bug-fix 
>> version of 3.5 that contains bug fixes back-ported to 3.5, but does 
>> not contain any of the new features not yet released?
>>
>> 4.  Is there any way in RAD or GUS to load pathological images (e.g. 
>> associated with biosamples used for hybridization) into the GUS 
>> database?
>>
>> Thanks very much,
>> Dave
>>
>>
>>
>> Junmin Liu wrote:
>>> Hi, Dave,
>>> Again in line:
>>>
>>>>> The consensus not to load CEL files into the database - is it 
>>>>> because we only
>>>>> query for probe set data based on the gene, but not for probe cell 
>>>>> data? If I
>>>>
>>>> yes typically people query the summarized results at the probe set
>>>> level.
>>>
>>> Generally speaking, schema design and data management have to be in 
>>> the context of contract or any requirements you are obligated to.
>>>
>>> Ask the question what is the next if you load CEL? or what is the 
>>> next if you load array data and etc?
>>>
>>> GUS and its app stacks certainly will allow you do those things, but 
>>> it is critical you have some judgement calls. And the cost of 
>>> loading raw data then querying them out is pretty expensive.
>>>
>>>> There are multiple choices for where to store array annotation at the
>>>> moment.
>>>> 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence 
>>>> have been
>>>> added to more quickly annotate Affy data with Entrez Genes and 
>>>> RefSeq info
>>>> respectively.
>>>> 2. Another possibility is to use the external_database_release_id and
>>>> source_id pair in RAD.ShortOligoFamily to point to one preferred
>>>> annotation for each probe set (but you would have to choose one).
>>>> 3. Another, less structured possibility, is to use
>>>> RAD.CompositeElementAnnotation, where you use the attribute 'name' to
>>>> denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the
>>>> attribute 'value' for the annotation (e.g. entrez gene id, or 
>>>> refseq id,
>>>> etc.) itself. This has less structured but it will allow you to 
>>>> load as
>>>> many annotations as you like.
>>>
>>> I normally favor the consistant data management policy, that means, 
>>> you don't need documentation somewhere saying "case 1, load data 
>>> into table a, b, c; case 2, load data into table d, e, f; case 3, 
>>> load data into table g, h, i", which not only make you data loading 
>>> tough, also will make you app code built on top db stink.
>>>
>>> We didn't manage our own db perfectly neither. But hopefully our 
>>> experiences could prove useful to you.
>>>
>>> I strongly suggest you look at the MAGE-Tab spec for raw/processed 
>>> data and ADF spec for array data on ArrayExpress site, for MAGE-Tab 
>>> and ADF are proved to be very effective for large db like AE. If you 
>>> can make your app/db align to the standards as we are trying to do 
>>> also, it certainly give you a safe edge.
>>>
>>> ---junmin
>>>
>>
>