|
From: <ja...@op...> - 2001-11-27 05:07:49
|
Hey Hilmar! Interesting to see that Novartis has taken an interest. "Hilmar Lapp" <la...@gn...> writes: > Quoting "Jason E. Stewart" <ja...@op...>: > > > There needs to be some significant > > database changes to make it work properly, and this will be released > > in the Genex-2 branch under development. > > May I ask how this will in general work, and what the required DB > changes were about? Mind you, this is for the upcoming Genex-2 version, not the code that is already available. Genex-2 was underway before the MAGE model was finalized. The primary change between Genex-1 and Genex-2 was a far more useful security model. In Genex-1 you can only protect ExperimentSets, ArrayMeasurements, and AM_Spots. All the rest is world viewable. Genex-2 enables you to protect *all* data: protocols, samples, contacts, etc. It also introduces audit information so you can track what was changed and by whom. And it introduces a generic authentication mechanism used by all CGI scripts -- so you have to login to the system before viewing data, making queries, manipulating data. Because MAGE is now (mostly) finalized, a good deal of the plans for Genex-2 will be the renaming of objects/tables to fit with MAGE nomenclature plus the addition of a number of additional tables/objects specified by MAGE. Genex-2 will *not* be fully MAGE compliant, but it will have major pieces. > Some background as to why I'm interested in the details: With my > previous employer we actually together with a consultant developed a > high-throughput general database loader, which would take any > record-oriented input file and load it to any relational > database. The limitation is obviously SQL on the DB end; i.e., > anything you cannot load through SQL cannot be loaded with that > tool. That limits you to a) insert into 1 table at a time, or b) > insert into 1 view at a time, provided you can attach insert > triggers to the view (which you can in Oracle), or c) call a stored > procedure. We used b) and c), with all the relational logic > (LU,PK,FK etc) staying within the DB. I'm actually trying to get > them to release the code (Java), not sure how successful this is > going to be. Sounds pretty cool. If you wanted to, that could easily be hosted at the MAGEstk site (mged.sf.net), the GeneX site, or at OpenInformatics (www.openinformatics.org). The Genex-2 data loader will *not* be a general purpose solution, it will strictly handle microarray data. You will need to specify two templates in order to use the loader: 1) the ArrayLayout (or ArrayDesign if you speak MAGE) 2) the QuantitiationDimension (from MAGE) that is defined by the combination of array technology and feature extraction software you used. This is a mapping that describes how many columns are in the output file, what their data type is and what the semantic meaning of the column is Once they are specified it just a matter of slurping in rows of data from the array files and entering them into the appropriate table in the DB. A major change in Genex-2 will be how the AM_Spots table is handle. In Genex-1 there is a single table into which all data is smashed. This works, and it is very general, but it creates too many problems. The solutiont that we've decided to pursue in Genex-2 is to use a different AM_Spots table for each new QuantitationDimension. That way if your data generates an array of 80 floats for each spot (or Feature in MAGE speak), all of those 80 numbers will go into a single row in the AM_Spots table for that technology. Genex-1 would force you to create 80 ArrayMeasurements each with a single value/spot in the AM_Spots table (yuck!). > > In the mean time, I took code > > that was graciously donated by Michael Pear, and got a data loader > > working for Genex-1. > > > > In the meantime, if you want to help pre-test the code, let me know. > > Sure. Especially if it helps me migrate a couple of thousand chip data > to our local GeneX in order to test its performance. You can check the code out from CVS. You'll want to use the 'Rel-1_0_1-branch' branch. Info on how to get the code from CVS is at: https://sourceforge.net/cvs/?group_id=16453 Once you've logged in you'll want to do the following: cvs -d:pserver:ano...@cv...:/cvsroot/genex \ co -r Rel-1_0_1-branch genex-server except of course you want it all on one line without the backslash... That will give you a working copy of GeneX-Server-1.0.5. The dataloader is in the affyloader/ directory. !!! WARNING !!! There isn't a huge amount of documentation available on the code. I've added a USAGE to each and a --help flag that *should* print out useful info, but YMMV. Please write to the list if you need help. You'll want to run a complete install even if you already have a working GeneX installation: there were two changes to the DB one to fix a bug in AL_Spots (the primary key was not being auto-generated), and the other is the addition of a view on the AM_Spots table. So you want to make sure that the DB installer runs and downloads the new DB init file (1.0.5) from the internet. BTW, GeneX has a nice feature for updating an existing installation. Check out the section on 'Updating an installation' in the INSTALL file. jas. |