From: Dave H. <doc...@gm...> - 2007-09-13 23:34:19
|
Thanks Junmin and Elisabetta for your helpful comments. The consensus not to load CEL files into the database - is it because we only query for probe set data based on the gene, but not for probe cell data? If I store the CEL file in the filesystem and only store a file URI in the database, does RAD provide a way to run summarization algorithms (e.g. RMA, Plier) on those files? Can I load multiple sets of probe set data for a single set of probe cell data (e.g. one for RMA, one for Plier)? Also, according to the instructions in the RAD website on how to load a complete microarray study into the GUS database, the first step mentions "Further array annotation can be loaded via GUS::Community::Plugin::InsertArray2DbRefAndNaSeq. I tried to run this plugin, but got this error: FATAL: Can't locate GUS/Model/RAD/CompositeElementDbRef.pm in @INC Do you know where I can find this CompositeElementDbRef.pm file? I would like to load the annotation file I obtained from the Affymetrix website for the HG-U133_Plus_2 array into the GUS database. What's the best way to go about this? Thanks very much for your help. Best regards, Dave Junmin Liu wrote: > Hi, Dave, > I had couple discussion with other people in ArrayExpress and Joe > white from Harvard in terms of raw data loading in previous MGED > workshops. > > The consensus is that especially for the CEL file, people don't load > them into database, unless you got some convincing use cases or strong > needs to load cel file into database. > > So give it a second thought before you even proceed. > ---junmin > > > > On Wed, 12 Sep 2007, Dave Hau wrote: > >> Elisabetta, >> >> Thanks for your and John Brestelli's (via personal email) very >> informative replies. They are very helpful indeed. >> >> Regarding loading .CEL files (probe cell data, not probe set data), John >> mentioned the plugin GUS::Community::Plugin::LoadBatchArrayResults which >> I had noticed too. The help page for this plugin mentions a number of >> quantification protocols supported including mas4/mas5 (Affymetrix MAS >> 4.0 and 5.0 Probe Set quantification protocol) and cel4/cel5 (Affymetrix >> MAS 4.0 and 5.0 Probe Cell quantification protocol). It seems that >> cel4/cel5 would correspond to the .CEL files I need to load (i.e. probe >> *cell* data). Is this correct? I was wondering because you mentioned in >> your reply that there's no plugin available for loading probe cell data. >> >> Also, in the Affymetrix file format description document ( >> http://www.affymetrix.com/support/developer/AffxFileFormats.ZIP ), two >> file formats are described: Version 3 files (text data) generated by the >> MAS software, and version 4 files (binary data) generated by the GCOS >> software. So both cel4 and cel5 for the plugin would correspond to >> Version 3 files, right? That means the LoadBatchArrayResults plugin does >> not support the Version 4 (binary) file format, correct? >> >> Thanks again for your help. >> >> Best regards, >> Dave Hau >> >> >> Elisabetta Manduchi wrote: >>> >>> Hi Dave, >>> let me clarify GUS vs Affy. >>> Affymetrix quantified results are of two types, corresponding to 2 >>> different level of analysis: >>> >>> (i) probe-cell level results (e.g. from .CEL files), which contain >>> intensity values for each individual probe cell on the chip; and >>> (ii) probe-set level results (e.g. obtained from MAS4 or MAS 5 and in >>> the .CHP files, or from RMA or gcRMA) which contain *summarized* >>> intensities for probe sets on the chip. >>> >>> The GUS schema in principle supports storage of both: >>> >>> (i) the probe cell results would go into a view of >>> RAD.ElementResultImp (in fact there is a view to this end called >>> RAD.AffymetrixCEL); >>> (ii) the probe set results would go to view of >>> RAD.CompositeElementResultImp. For the latter, currently we have views >>> to accomodate MAS4 or 5 (RAD.AffymetrixMAS4 or RAD.AffymetrixMAS5) and >>> RMA/gcRMA results (RAD.RMAExpress, which will actually be renamed >>> RAD.RMA in the next GUS release). >>> >>> Now, here at CBIL, we do not store or support loading of the .CEL file >>> data in the database, because we really only use the probe-set level >>> results in our applications, so we have no need to store .CEL in the >>> db. >>> So the way we do it is as follows: >>> * for every Affymetrix assay, we have TWO related quantifications, one >>> corresponding to the .CEL quantification and the other corresponding >>> to whatever summarization quantification was created (e.g. with MAS4, >>> MAS5, RMA); >>> * we place 2 entries in RAD.Quantifications, one pointing to the uri >>> of the .CEL file (which we keep on our server) and one pointing to the >>> uri of the probe-set level result file >>> * we however do not store the data from the .CEL file in >>> RAD.AffymetrixCEL >>> * we only store the data from the probe-set level results in one of >>> the RAD.CompositeElementResultImp views mentioned above. >>> >>> The current plugin in GUS::Supported, as Junmin mentioned in the >>> posting you are referring to, can be used to populate the data for the >>> probe-set level results. As far as I know, we do not have currently a >>> plugin to store the .CEL files in the db. >>> So the db allows for the latter, but you'd have to write your own >>> plugin. We didn't find useful to store .CEL results in GUS, but again >>> this depends on the type of applications you might be interested in. >>> Hope this helps, >>> Elisabetta >>> >>> >>> On Tue, 28 Aug 2007, Dave Hau wrote: >>> >>>> I would like to import a number of Affymetrix .CEL files into the GUS >>>> database, which was installed from top of trunk from the GUS svn >>>> repository. The CEL files each have some text headers, and then binary >>>> data afterwards. So I suppose they are in CEL Version 4 format. >>>> >>>> Doing some search on previous posts, I came across this one: >>>> >>>> http://sourceforge.net/mailarchive/message.php?msg_id=Pine.LNX.4.61.0512141526200.18143%40hera.pcbi.upenn.edu >>>> >>>> >>>> >>>> It seems that at the time of the post (12/2005), the way these .CEL >>>> files would be imported was that the headers would go to one of the >>>> Affymetrix views (AffymetrixMAS4 or AffymetrixMAS5 or AffymetrixCEL), >>>> the actual file would sit in the file system, and we'd insert a row to >>>> the RAD.Quantification table with a URI pointing to the location of >>>> the >>>> .CEL file. >>>> >>>> Also, looking through the different plugins in both the Supported and >>>> Community folders, it seems LoadBatchArrayResults supports the cel4 >>>> format. Is this the plugin I should use? >>>> >>>> Any help would be much appreciated. Thanks. >>>> >>>> Best regards, >>>> Dave Hau >>> >> >> >> ------------------------------------------------------------------------- >> >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2005. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> > |