From: Junmin L. <ju...@pc...> - 2007-09-13 14:09:23
|
Hi, Dave, I had couple discussion with other people in ArrayExpress and Joe white from Harvard in terms of raw data loading in previous MGED workshops. The consensus is that especially for the CEL file, people don't load them into database, unless you got some convincing use cases or strong needs to load cel file into database. So give it a second thought before you even proceed. ---junmin On Wed, 12 Sep 2007, Dave Hau wrote: > Elisabetta, > > Thanks for your and John Brestelli's (via personal email) very > informative replies. They are very helpful indeed. > > Regarding loading .CEL files (probe cell data, not probe set data), John > mentioned the plugin GUS::Community::Plugin::LoadBatchArrayResults which > I had noticed too. The help page for this plugin mentions a number of > quantification protocols supported including mas4/mas5 (Affymetrix MAS > 4.0 and 5.0 Probe Set quantification protocol) and cel4/cel5 (Affymetrix > MAS 4.0 and 5.0 Probe Cell quantification protocol). It seems that > cel4/cel5 would correspond to the .CEL files I need to load (i.e. probe > *cell* data). Is this correct? I was wondering because you mentioned in > your reply that there's no plugin available for loading probe cell data. > > Also, in the Affymetrix file format description document ( > http://www.affymetrix.com/support/developer/AffxFileFormats.ZIP ), two > file formats are described: Version 3 files (text data) generated by the > MAS software, and version 4 files (binary data) generated by the GCOS > software. So both cel4 and cel5 for the plugin would correspond to > Version 3 files, right? That means the LoadBatchArrayResults plugin does > not support the Version 4 (binary) file format, correct? > > Thanks again for your help. > > Best regards, > Dave Hau > > > Elisabetta Manduchi wrote: >> >> Hi Dave, >> let me clarify GUS vs Affy. >> Affymetrix quantified results are of two types, corresponding to 2 >> different level of analysis: >> >> (i) probe-cell level results (e.g. from .CEL files), which contain >> intensity values for each individual probe cell on the chip; and >> (ii) probe-set level results (e.g. obtained from MAS4 or MAS 5 and in >> the .CHP files, or from RMA or gcRMA) which contain *summarized* >> intensities for probe sets on the chip. >> >> The GUS schema in principle supports storage of both: >> >> (i) the probe cell results would go into a view of >> RAD.ElementResultImp (in fact there is a view to this end called >> RAD.AffymetrixCEL); >> (ii) the probe set results would go to view of >> RAD.CompositeElementResultImp. For the latter, currently we have views >> to accomodate MAS4 or 5 (RAD.AffymetrixMAS4 or RAD.AffymetrixMAS5) and >> RMA/gcRMA results (RAD.RMAExpress, which will actually be renamed >> RAD.RMA in the next GUS release). >> >> Now, here at CBIL, we do not store or support loading of the .CEL file >> data in the database, because we really only use the probe-set level >> results in our applications, so we have no need to store .CEL in the db. >> So the way we do it is as follows: >> * for every Affymetrix assay, we have TWO related quantifications, one >> corresponding to the .CEL quantification and the other corresponding >> to whatever summarization quantification was created (e.g. with MAS4, >> MAS5, RMA); >> * we place 2 entries in RAD.Quantifications, one pointing to the uri >> of the .CEL file (which we keep on our server) and one pointing to the >> uri of the probe-set level result file >> * we however do not store the data from the .CEL file in >> RAD.AffymetrixCEL >> * we only store the data from the probe-set level results in one of >> the RAD.CompositeElementResultImp views mentioned above. >> >> The current plugin in GUS::Supported, as Junmin mentioned in the >> posting you are referring to, can be used to populate the data for the >> probe-set level results. As far as I know, we do not have currently a >> plugin to store the .CEL files in the db. >> So the db allows for the latter, but you'd have to write your own >> plugin. We didn't find useful to store .CEL results in GUS, but again >> this depends on the type of applications you might be interested in. >> Hope this helps, >> Elisabetta >> >> >> On Tue, 28 Aug 2007, Dave Hau wrote: >> >>> I would like to import a number of Affymetrix .CEL files into the GUS >>> database, which was installed from top of trunk from the GUS svn >>> repository. The CEL files each have some text headers, and then binary >>> data afterwards. So I suppose they are in CEL Version 4 format. >>> >>> Doing some search on previous posts, I came across this one: >>> >>> http://sourceforge.net/mailarchive/message.php?msg_id=Pine.LNX.4.61.0512141526200.18143%40hera.pcbi.upenn.edu >>> >>> >>> It seems that at the time of the post (12/2005), the way these .CEL >>> files would be imported was that the headers would go to one of the >>> Affymetrix views (AffymetrixMAS4 or AffymetrixMAS5 or AffymetrixCEL), >>> the actual file would sit in the file system, and we'd insert a row to >>> the RAD.Quantification table with a URI pointing to the location of the >>> .CEL file. >>> >>> Also, looking through the different plugins in both the Supported and >>> Community folders, it seems LoadBatchArrayResults supports the cel4 >>> format. Is this the plugin I should use? >>> >>> Any help would be much appreciated. Thanks. >>> >>> Best regards, >>> Dave Hau >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > |