From: Dave H. <doc...@gm...> - 2007-09-14 19:51:14
|
My bad... I just noticed the LoadMageDoc plugin is in the community plugin directory. Thanks Elisabetta for your prompt reply. - Dave Elisabetta Manduchi wrote: > > Hi Dave, > I'll respond to 2 and 4. For (1) I defer to Junmin. > For (3) all I can say is that it is in our lab's plans to release > bug-fixes and new releases of GUS, however this keeps being postponed > due to other priorities. In the meantime for postresql questions re > GUS, John Iodice might be able to help you. > Getting back to your question (2), first of all, as mentioned in my > previous email we currently have a view for RMA results, but we do not > have a view for Plier results. If you need a view for Plier in your > instance of the DB though, you can simply create such a view with the > attributes you need in your own instance. It would be a view of > RAD.CompositeElementResultImp. Once created, remember to update > Core.TableInfo and rebuild GUS, so that the objects for the new view > are in place. > The current available plugins to load data into > RAD.CompositeElementResultImp views are: LoadArrayResult (in > Supported) which loads the results of one assay at a time, and > LoadBatchResult which we have already discussed. The documentation of > these plugins, available from svn illustrates, what the input format > should be. The idea guiding the design of these plugins we made > available was that they would be *generic*, i.e. they would be able to > take data from a wide variety of quantification software and load them > into RAD. So we opted for one generic code at the expense of some work > to put the input into the appropriate format. > If a project/lab typically gets files in a particular data format, > then it might be worth for them to write a plugin which is specific to > that rather than using the generic plugin. This way they can use the > output as spit out by the software they use. It is fairly simple to > write a plugin specific to one's needs using the Plugin package. So if > you expect to deal most of the timewith a particular type of output > (e.g. from APT) you might consider writing a specific plugin. > > Regarding your question (4), the answer is no. We do not store images > in GUS. For certain types of images, like microarray images (e.g. > files resulting from scanning, like .TIF or .DAT) we store in the db > their uri to the fileserver (in RAD.Acquisition.uri). > Hope this helps, > Elisabetta > > --- > > On Fri, 14 Sep 2007, Dave Hau wrote: > >> Junmin and Elisabetta, thanks again for your helpful comments. >> >> Couple of questions. >> >> 1. The HG-U133_Plus_2 array annotation file I downloaded from >> Affymetrix is an xml file in MAGE-ML format. On the RAD download >> page ( http://www.cbil.upenn.edu/downloads/RAD/ ), I see a tool >> called mage2tab-v0.9, which I assume would be able to convert the >> annotation file to MAGE-TAB format. Then in order to load this >> MAGE-TAB file into GUS, I noticed on the CBIL Lab Meetings web page, >> for Thursday March 15, 2007, Junmin gave a talk on MR-Ti, and the >> description mentions the loadMageDoc GUS plugin. I notice (and have >> downloaded) a file on the RAD download page called >> "MR_T_ForGUS35.tar.gz" but the loadMageDoc plugin is not in there. >> Is there a way for me to obtain this plugin? >> >> 2. I ran "apt-probeset-summarize" in the Affymetrix Power Tools >> (APT) package ( >> http://www.affymetrix.com/support/developer/powertools/index.affx ) >> and obtained probe set data for my .CEL files, one set for RMA and >> another set for PLIER. Is there a plugin that will readily load >> these APT output files into GUS as probe set data? >> >> 3. The GUS installation I'm using is top of trunk from the CBIL svn >> repository. This is because I'm using postgresql on the back end, >> and the 3.5 GUS package gave me a lot of problems. These seem to >> have been fixed in the top of trunk. However, in order to use >> existing plugins, would it be advisable to use top of trunk >> (including the new schema changes for new features that Elisabetta >> mentioned)? If not, is there, or do you plan on releasing a bug-fix >> version of 3.5 that contains bug fixes back-ported to 3.5, but does >> not contain any of the new features not yet released? >> >> 4. Is there any way in RAD or GUS to load pathological images (e.g. >> associated with biosamples used for hybridization) into the GUS >> database? >> >> Thanks very much, >> Dave >> >> >> >> Junmin Liu wrote: >>> Hi, Dave, >>> Again in line: >>> >>>>> The consensus not to load CEL files into the database - is it >>>>> because we only >>>>> query for probe set data based on the gene, but not for probe cell >>>>> data? If I >>>> >>>> yes typically people query the summarized results at the probe set >>>> level. >>> >>> Generally speaking, schema design and data management have to be in >>> the context of contract or any requirements you are obligated to. >>> >>> Ask the question what is the next if you load CEL? or what is the >>> next if you load array data and etc? >>> >>> GUS and its app stacks certainly will allow you do those things, but >>> it is critical you have some judgement calls. And the cost of >>> loading raw data then querying them out is pretty expensive. >>> >>>> There are multiple choices for where to store array annotation at the >>>> moment. >>>> 1. RAD.CompositeElementDbRef and RAD.CompositeElementNASequence >>>> have been >>>> added to more quickly annotate Affy data with Entrez Genes and >>>> RefSeq info >>>> respectively. >>>> 2. Another possibility is to use the external_database_release_id and >>>> source_id pair in RAD.ShortOligoFamily to point to one preferred >>>> annotation for each probe set (but you would have to choose one). >>>> 3. Another, less structured possibility, is to use >>>> RAD.CompositeElementAnnotation, where you use the attribute 'name' to >>>> denote the annotation (e.g. "Entrez Gene", "RefSeq", etc.) and the >>>> attribute 'value' for the annotation (e.g. entrez gene id, or >>>> refseq id, >>>> etc.) itself. This has less structured but it will allow you to >>>> load as >>>> many annotations as you like. >>> >>> I normally favor the consistant data management policy, that means, >>> you don't need documentation somewhere saying "case 1, load data >>> into table a, b, c; case 2, load data into table d, e, f; case 3, >>> load data into table g, h, i", which not only make you data loading >>> tough, also will make you app code built on top db stink. >>> >>> We didn't manage our own db perfectly neither. But hopefully our >>> experiences could prove useful to you. >>> >>> I strongly suggest you look at the MAGE-Tab spec for raw/processed >>> data and ADF spec for array data on ArrayExpress site, for MAGE-Tab >>> and ADF are proved to be very effective for large db like AE. If you >>> can make your app/db align to the standards as we are trying to do >>> also, it certainly give you a safe edge. >>> >>> ---junmin >>> >> > |