You can subscribe to this list here.
| 2001 |
Jan
(135) |
Feb
(57) |
Mar
(84) |
Apr
(43) |
May
(77) |
Jun
(51) |
Jul
(21) |
Aug
(55) |
Sep
(37) |
Oct
(56) |
Nov
(75) |
Dec
(23) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(32) |
Feb
(174) |
Mar
(121) |
Apr
(70) |
May
(55) |
Jun
(20) |
Jul
(23) |
Aug
(15) |
Sep
(12) |
Oct
(58) |
Nov
(203) |
Dec
(90) |
| 2003 |
Jan
(37) |
Feb
(15) |
Mar
(14) |
Apr
(57) |
May
(7) |
Jun
(40) |
Jul
(36) |
Aug
(1) |
Sep
(56) |
Oct
(38) |
Nov
(105) |
Dec
(2) |
| 2004 |
Jan
|
Feb
(117) |
Mar
(69) |
Apr
(160) |
May
(165) |
Jun
(35) |
Jul
(7) |
Aug
(80) |
Sep
(47) |
Oct
(23) |
Nov
(8) |
Dec
(42) |
| 2005 |
Jan
(19) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Hilmar L. <la...@gn...> - 2001-11-28 20:13:52
|
Quoting "Jason E. Stewart" <ja...@op...>:
>
> In Genex-2, the DB is created from a DBMS-independent XML
> representation of the tables, so the XML files are always the most
> current representation of the schema. It also makes it pretty easy to
> plug in a new DBMS into GeneX by simply overriding the default XML ->
> SQL translator where it differs from Postgres.
>
Sounds cool. Do you use a specific XML editing tool/tree viewer?
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: la...@gn...
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
|
|
From: <ja...@op...> - 2001-11-28 17:30:14
|
"Hilmar Lapp" <la...@gn...> writes: > Quoting "Jason E. Stewart" <ja...@op...>: > > > The diagram that Todd posted is for Genex-1, and it is out of > > date with the latest changes that will be released in 1.0.5 > > That's what I realized. It's also dated with respect to 1.0.4, which > I realized after looking at the tables directly. It seems that a > number of things I was about to point out were fixed, whereas other > problems were introduced. Before going into any more potentially dated > details I really want to have the schema in front of me. Right. The scripts in DB/tdscripts/ are mostly up-to-date. Once upon a time they were always up to date, because they were used to generate new versions of the DB. The person who took over the role of DB maintainer after me noticed that Postgres's pg_dump utility would export the schema into the DB dumps if you asked it, and stopped maintaining the tdscripts. This was an unfortunate accident that has caused a number of problems. In Genex-2, the DB is created from a DBMS-independent XML representation of the tables, so the XML files are always the most current representation of the schema. It also makes it pretty easy to plug in a new DBMS into GeneX by simply overriding the default XML -> SQL translator where it differs from Postgres. But that doesn't help you at the moment. Until I can get funds to pay for the time it will take to produce an new ER diagram for Genex-1 and Genex-2, I will have to work on other things. I'll be talking to Bill Pearson at UVA later on, and perhaps he'll agree to my doing this. jas. |
|
From: Hilmar L. <la...@gn...> - 2001-11-28 17:07:36
|
Quoting "Jason E. Stewart" <ja...@op...>:
>
> Just to warn you, if by 'the present ERD' you mean the proposed
> changes to Genex-2 that I've been meandering about, no diagram
> exists.
I meant the diagram for the 1.0.x branch I checked out (I installed
1.0.4).
> The diagram that Todd posted is for Genex-1, and it is out of
> date with the latest changes that will be released in 1.0.5
That's what I realized. It's also dated with respect to 1.0.4, which
I realized after looking at the tables directly. It seems that a
number of things I was about to point out were fixed, whereas other
problems were introduced. Before going into any more potentially dated
details I really want to have the schema in front of me.
BTW as for ArgoUML, I don't mind installing another tool, as long as it's
free or we (and everyone else interested in a live version) have a license.
It also turns out that we have a license for Oracle designer, so that would
be fine with me too.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: la...@gn...
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
|
|
From: Todd F. P. <tf...@nc...> - 2001-11-28 16:19:31
|
Dear Matthew:
Thank you for your interest in the GeneX System and your feedback. Note
that I have cc'd this message to the Sourceforge developer list. More
experts monitor that list than the ge...@nc... list.
To get the Curation Tool to talk to your server you must start the
Curation Tool and under the file menu select properties. This will launch
a screen with various user-modifiable parameters including the location of
the server. Specifically, the Perl script URLs for control bundle
download, experiment set download and experiment set submit may be
specified. Then select 'SAVE CHANGES' and 'CLOSE'. These changes will be
reflected in the file curation_tool/data/curation_tool.prp.
We will study your comments on the installation and hopefully incorporate
them into the procedure or make changes necessary to avoid similar
problems.
Thanks,
Todd Peterson
NCGR
On Wed, 28 Nov 2001, Matthew Hobbs wrote:
> Hi,
>
> I am interested in deploying GeneX locally. Yesterday I managed to get
> the GeneX server going on my system (a PC running Debian Linux 2.4.14)
> and today I downloaded the Curation tool which seems to start OK
> although I haven't used it properly yet.
>
> As requested...
>
> > Suggestions
> > ===========
> >
> > Please keep track of things that you didn't understand or weren't
> > explained well and email them directly to us at:
> > ge...@nc...
> >
> ...here are a couple of points:
>
> i) I was unable to load the test data into a "skeleton" genex database
> until I changed this authentication configuration line in my
> pg_hba.conf file (untouched since my installation of postgres) from:
>
> local all peer sameuser
>
> to:
> local all trust
>
> This allowed user genex to connect without a password which is what
> genex-initdb-latest tries to do first up.
>
>
> ii) I encountered these lines at the end of the GeneX server
> installation process:
>
>
> NB: If you haven't done so in a previous installation, you still have
> to:
> 1) add the text from the file [cron.entries] to the appropriate crontab
> (and
> force a re-read of that crontab) to intitiate the genex_reaper script
> that
> cleans up the temporary files. The important lines are:
> # run the reaper 13 minutes after the hour, every hour.
> 13 * * * * /usr/local/genex/bin/genex_reaper.pl
> # delete files older than 24 hours every 2hours
> 15 0-23/2 * * * find /usr/local/genex/rcluster/var/poqs/jobs
> -name job* -depth -mtime 1 -exec rm -rf {} ; 2> /dev/null
>
> 2) Add the line:
> /usr/local/genex/rcluster/bin/queue_master &
> to your local startup file (ie. /etc/rc.d/rc.local on RedHat)
>
>
> but this information really should be said in the installation
> documentation somewhere too I think.
>
>
> iii) (This is my main question!) The Curation tool runs, and I have an
> idea of how it should work from the tutorial documents, but I'm not at
> all sure how to use it in conjunction with my own GeneX server rather
> than the NCGR server. Can you please point me at some documentation or
> give me some advice on how to configure it? Obviously I want to be
> able to use it to load data into my own installation.
>
>
> Thanks,
>
>
> Matthew Hobbs
>
>
>
|
|
From: Todd F. P. <tf...@nc...> - 2001-11-28 06:59:44
|
It's too bad I've been swamped with PathDB work. Had a pretty good start on customization of supplied UML format to produce DTD or code. Starting to get back to genex after some reorg'ing up there. Will be spending more time with DoME to make it work well. Have modeled a good portion of the MAGE UML model with it. todd On 27 Nov 2001, Jason E. Stewart wrote: > "Todd F. Peterson" <tf...@nc...> writes: > > > That's why I still advocate DoME. Will take a look at the other tool > > mentioned. > > I have time at this point, so I will take a look at DoME, thanks Todd. > > jas. > > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev > |
|
From: <ja...@op...> - 2001-11-28 05:20:25
|
"Jason E. Stewart" <ja...@op...> writes:
> I mangled some of the 1.0.5 fixes in CVS. Please run an update
> on the Rel-1_0_1-branch again to get the correct code:
>
> cvs -d :ext:jas...@cv...:/cvsroot/genex \
> update -r Rel-1_0_1-branch genex-server
>
> except all on one line (and without the '\')
Umm... Let's try that again.
If you haven't checked out the code yet, you can by using:
cvs -d :pserver:ano...@cv...:/cvsroot/genex \
co -r Rel-1_0_1-branch genex-server
If you *have* already checked out the code, you can update using:
cvs -d :pserver:ano...@cv...:/cvsroot/genex \
update
from within your checkout directory.
Sorry, got to learn to actually read the emails before I hit the send
key.
jas.
|
|
From: <ja...@op...> - 2001-11-28 05:13:13
|
Hey All,
I mangled some of the 1.0.5 fixes in CVS. Please run an update
on the Rel-1_0_1-branch again to get the correct code:
cvs -d :ext:jas...@cv...:/cvsroot/genex \
update -r Rel-1_0_1-branch genex-server
except all on one line (and without the '\')
Please look at the code in the affyloader/ directory that can be used
with the sample data files in the affyloader/samples/ directory, and
let me know what you think.
Sorry for the inconvenience.
jas.
|
|
From: <ja...@op...> - 2001-11-28 05:02:52
|
"Todd F. Peterson" <tf...@nc...> writes: > That's why I still advocate DoME. Will take a look at the other tool > mentioned. I have time at this point, so I will take a look at DoME, thanks Todd. jas. |
|
From: Todd F. P. <tf...@nc...> - 2001-11-28 04:18:46
|
That's why I still advocate DoME. Will take a look at the other tool mentioned. Todd On 27 Nov 2001, Jason E. Stewart wrote: > "Hilmar Lapp" <la...@gn...> writes: > > > Quoting "Todd F. Peterson" <tf...@nc...>: > > > > > I have placed the ERWin file on the genex website at: > > > http://genebox.ncgr.org/download/DB/ > > > > > > > It seems that the present ERD has some considerable differences > > from the one available on the website, and I'm having difficulties > > getting hold of our person who's got an ERwin license. If someone > > could post an image of that, it'd help. > > Just to warn you, if by 'the present ERD' you mean the proposed > changes to Genex-2 that I've been meandering about, no diagram > exists. The diagram that Todd posted is for Genex-1, and it is out of > date with the latest changes that will be released in 1.0.5 (because I > don't have an ERwin license either ;-) > > jas. > > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev > |
|
From: <ja...@op...> - 2001-11-28 03:41:14
|
"Hilmar Lapp" <la...@gn...> writes: > Quoting "Todd F. Peterson" <tf...@nc...>: > > > I have placed the ERWin file on the genex website at: > > http://genebox.ncgr.org/download/DB/ > > > > It seems that the present ERD has some considerable differences > from the one available on the website, and I'm having difficulties > getting hold of our person who's got an ERwin license. If someone > could post an image of that, it'd help. Just to warn you, if by 'the present ERD' you mean the proposed changes to Genex-2 that I've been meandering about, no diagram exists. The diagram that Todd posted is for Genex-1, and it is out of date with the latest changes that will be released in 1.0.5 (because I don't have an ERwin license either ;-) jas. |
|
From: Hilmar L. <la...@gn...> - 2001-11-28 03:16:04
|
Quoting "Todd F. Peterson" <tf...@nc...>: > I have placed the ERWin file on the genex website at: > http://genebox.ncgr.org/download/DB/ > It seems that the present ERD has some considerable differences from the one available on the website, and I'm having difficulties getting hold of our person who's got an ERwin license. If someone could post an image of that, it'd help. -hilmar -- ------------------------------------------------------------- Hilmar Lapp email: la...@gn... GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- |
|
From: <ja...@op...> - 2001-11-28 02:52:40
|
"Hilmar Lapp" <la...@gn...> writes: > Quoting "Jason E. Stewart" <ja...@op...>: > > > > No, sorry. That is one of the priority one tasks that I indicated for > > the virginia consortium. > > I'm happy to see that you're going to use ArgoUML instead of ERwin. ;-) Actually, I'm most likely to use the community edition of Poseidon as long as the output is compatible with ArgoUML. I need to get gcj working to speed it up. Currently, Java's too slow on my linux laptop. jas. |
|
From: <ja...@op...> - 2001-11-28 02:43:21
|
Hey Hilmar, Thanks for taking the time to voice your ideas/concerns. "Hilmar Lapp" <la...@gn...> writes: > Quoting "Jason E. Stewart" <ja...@op...>: > > So the Virginia collaborators agreed and committed to do all of that? > Did you agree on a timeline? They have not yet committed to the proposal. They are currently discussing priorities and deciding what they really need and when. The only work I've actually done is what I just talked about for the Genex-1 branch. > Does agenda point A) mean that you better not try to squeeze Affy data > into GeneX right now? Sorry, didn't mean to give you that impression. You most certainly can, the real issue is a decision needs to be made about how you plan on representing replicate spots (i.e. same nucleic acid spotted/synthesized in different locations on the array). We thought we had made a clever decision to force users to break replicate spots into separate data columns (i.e. stored as separate ArrayMeasurements in the DB). This looked appealing, but it screws too many things up. You'll notice that the AM_Spots table has a FK to the UserSequenceTable but not to the AL_Spots table. This wants to be the other way around. > Well, I'll probably have to acquaint myself really well with what > QuantitationDimension really is about before I continue to make dumb > statements. QuantitationDimension is an order list of QuantitationTypes, which in turn specify the data type of a piece of data as well as its semantic meaning (intensity, background, etc). Every data matrix has two dimensions the DesignElement dimension (the number of spots or genes on the array) and the QuantitationDimension (the number of columns and their data types). > Anyway, in the first place my gut feeling from all the DB work I did > before tells me that there is something wrong if you need to create > tables on the fly for a particular new dataset coming in. I agree. > Again, without the ERD in front of me the following statements may be > very dumb. But if you really are going to create a 7-column row > for every feature on every chip, this will put you into big trouble > for Affy chips, for which with the current technology you have 408k > features on one single chip (and this is about to up). > I.e., every unused float adds 8x408k=3.2M to the storage for every chip > (i.e., gigabytes for 1000s of chips), Perhaps I wasn't clear. I don't want to have a single spot table, that forces 7 columns for every spot, I want to have different tables that each have different number of columns so that your data can be dovetailed to exactly the correct table for the data matrix. If the feature extraction software you use produces 40 columns of output (ScanAlyze) I don't want to force all arrays to have 40 floats. I suspect that after a while, most arrays in the DB will be derived calculations based on the raw data. They will likely only have a single column of data (the value), or at most a few (average, std. dev., variance). In MAGE, each array has an ArrayDesign (ArrayLayout in GeneX), and a QuantitationDimension. The QuantitationDimension will determine what spot table is used for the data. I hope this is more clear. In terms of gigabytes of data, that's the price. If you want to keep all 40 values that scanalyze gives you, then you wind up with a DB like SMD, with 300M spots running on a 64 processor E10k. I'm hoping that the QuantDim solution, will mean that most of the data can be kept in a table with only a single data column. Much leaner. The only other solution that was presented to me, was a completely generic solution that meant doing a three table join to get back all the data from each spot. Looked horribly inefficient, and incredibly obtuse. I figure I've already wasted 3 months of programming time trying to explain all the bad complicated design decisions in genex-1, I vote for simple. If you have any better ideas *PLEASE* suggest them. The ideas I have using the QuantDim approach are *NOT* set in stone, by any means. From the sound of things you have a great deal more experience than I do in building databases. > unless Postgres is as smart as Oracle which doesn't physically store > NULLs. But then you have the block length in Oracle, which this row > is not going to exceed anyway (in fact, in Oracle you would use a > CLUSTER for this). I honestly don't know. > Regarding AM_Spots, I'm also not sure whether you really need the PK > there. Nope. We don't. Bad design decision. Especially since Postgres gives us oid's in every row anyway. > You could merge in AM_SuspectSpots (0..n), and I'm not sure why > the relationship to AL_Spots has to be n..n (I may easily be missing > something). As mentioned before, tossing the PK saves you potentially > GBs of storage, let alone the index-storage and it can save considerable > time on import. All correct. Actually the SpotLink table was not because we thought the relationship was n..n, it was for efficiency. Because we thought that every spot would be indexed versus the usf_fk, the only spots that needed to know what AL_Spot they came from were those without sequence features, i.e. blanks and controls. Since there are only a few of those on every array, we didn't want to have an extra als_fk column in the AM_Spots table. The same reasoning was used for AM_SuspectSpots: only a few spots will be bad, so put them in a seperate table. As I say, these turned out to be poor design choices. > BTW storing the ratio to seems to be redundant unless you have different > methods to compute that, and in that case it would be a 0..n relationship. > Same goes for background subtracted intensity. Sometimes that is the data that we will get. The database has to be able to handle it if users want to store it. I'm hoping the QuantDim idea makes things both flexible and efficient. Genex-1 was originally designed to only house an NCGR repository. It was later changed to be more lab-centric. Its current goals are almost completely lab-centric, because the biologists are the people who really need this technology. Cheers, jas. |
|
From: Hilmar L. <la...@gn...> - 2001-11-28 01:38:21
|
Quoting "Jason E. Stewart" <ja...@op...>:
> > Is there an ERD for the new schema already? (BTW which tool did you
> use
> > to create the current one which is available on the website?)
>
> No, sorry. That is one of the priority one tasks that I indicated for
> the virginia consortium.
I'm happy to see that you're going to use ArgoUML instead of ERwin.
>
> Ooops. I forgot to send the agenda in my last email. It has my
> proposal for Genex-2. I'm sending it with this mail.
>
So the Virginia collaborators agreed and committed to do all of that?
Did you agree on a timeline?
Does agenda point A) mean that you better not try to squeeze Affy data
into GeneX right now?
> > > QuantitationDimension. That way if your data generates an array of
> 80
> > > floats for each spot (or Feature in MAGE speak), all of those 80
> > > numbers will go into a single row in the AM_Spots table for that
> > > technology. Genex-1 would force you to create 80 ArrayMeasurements
> > > each with a single value/spot in the AM_Spots table (yuck!).
> >
> > So you're going to denormalize. Did you run into performance
> problems,
> > and if so, on which end, or in which situations? (Trying to learn
> from
> > your experience.)
>
> Sorry, not sure which case you mean when you say that we're going to
> denormalize, Genex-2 or Genex-1? If Genex-2 I'm not sure that it is
> really denormalizing, is it? Every array which produces output using a
> given QuantitationDimension will have a separate AM_Spots table.
Well, I'll probably have to acquaint myself really well with what
QuantitationDimension really is about before I continue to make
dumb statements. Anyway, in the first place my gut feeling from all
the DB work I did before tells me that there is something wrong if
you need to create tables on the fly for a particular new dataset
coming in.
> In
> Genex-1 we broke apart data that should never have been split in the
> first place, e.g. creating separate ArrayMeasurements for:
>
> * Channel 1 background
> * Channel 1 intensity
> * Channel 1 background subtracted intensity
> * Channel 2 background
> * Channel 2 intensity
> * Channel 2 background subtracted intensity
> * Channel 1/Channel 2 ratio
>
> when they all should have been a single ArrayMeasurement with 7
> columns in the AM_Spots table. Genex-2 will fix that.
Again, without the ERD in front of me the following statements may be
very dumb. But if you really are going to create a 7-column row
for every feature on every chip, this will put you into big trouble
for Affy chips, for which with the current technology you have 408k
features on one single chip (and this is about to up).
I.e., every unused float adds 8x408k=3.2M to the storage for every chip
(i.e., gigabytes for 1000s of chips), unless Postgres is as smart as Oracle
which doesn't physically store NULLs. But then you have the block length in
Oracle, which this row is not going to exceed anyway (in fact, in Oracle you
would use a CLUSTER for this).
Regarding AM_Spots, I'm also not sure whether you really need the PK
there. You could merge in AM_SuspectSpots (0..n), and I'm not sure why
the relationship to AL_Spots has to be n..n (I may easily be missing
something). As mentioned before, tossing the PK saves you potentially
GBs of storage, let alone the index-storage and it can save considerable
time on import.
BTW storing the ratio to seems to be redundant unless you have different
methods to compute that, and in that case it would be a 0..n relationship.
Same goes for background subtracted intensity.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: la...@gn...
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
|
|
From: Todd F. P. <tf...@nc...> - 2001-11-27 19:57:12
|
I have placed the ERWin file on the genex website at: http://genebox.ncgr.org/download/DB/ Todd Peterson NCGR |
|
From: Hilmar L. <la...@gn...> - 2001-11-27 19:46:10
|
Quoting "Jason E. Stewart" <ja...@op...>:
>
> Genex-2 enables you to protect *all* data: protocols, samples,
> contacts, etc. It also introduces audit information so you can track
> what was changed and by whom. And it introduces a generic
> authentication mechanism used by all CGI scripts -- so you have to
> login to the system before viewing data, making queries, manipulating
> data.
>
Is there an ERD for the new schema already? (BTW which tool did you use
to create the current one which is available on the website?)
> Because MAGE is now (mostly) finalized, a good deal of the plans for
> Genex-2 will be the renaming of objects/tables to fit with MAGE
> nomenclature plus the addition of a number of additional
> tables/objects specified by MAGE. Genex-2 will *not* be fully MAGE
> compliant, but it will have major pieces.
>
Do you already know which parts of MAGE will not be implemented
and/or covered by Genex-2?
> QuantitationDimension. That way if your data generates an array of 80
> floats for each spot (or Feature in MAGE speak), all of those 80
> numbers will go into a single row in the AM_Spots table for that
> technology. Genex-1 would force you to create 80 ArrayMeasurements
> each with a single value/spot in the AM_Spots table (yuck!).
So you're going to denormalize. Did you run into performance problems,
and if so, on which end, or in which situations? (Trying to learn from
your experience.)
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: la...@gn...
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
|
|
From: <ja...@op...> - 2001-11-27 05:58:24
|
"Jason E. Stewart" <ja...@op...> writes: > As it stands, I run an independent contracting company, Open > Informatics, which does contract development of GeneX. Todd Peterson > at NCGR still works on GeneX as well. Harry Mangalam of tacg > Informatics is also still (independently) involved with the project, > but to a lesser degree. Jennifer Weller and Karen Slauch of the > Virginia Bioinformatics Institute have also rejoined the project to a > limited degree. I guess I should mention that back in september we set up a policy wereby there would be an oversight group (currently Todd Peterson, Harry Mangalam, and myself) that would oversee the OpenSource GeneX project. New people could be elected into the group and the group would decide on the addition of new developers as well as the overall direction of the project. I've been speaking without any agreement from either Harry or Todd, but I've been the person most active in GeneX development lately, so I sometimes get carried away. jas. PS. You probably want to sign up for the genex-dev list at: http://lists.sourceforge.net/lists/listinfo/genex-dev |
|
From: <ja...@op...> - 2001-11-27 05:47:33
|
Hey Hilmar, "Todd F. Peterson" <tf...@nc...> writes: > > I'm evaluating using GeneX as our primary RNA profiling database, > > which would mean a volume of several thousand Affy chips to be > > served. I managed to install the system, and looked at the schema > > some more. I've got several questions / things you may be able to > > comment on. Excellent! > > 1) What is the largest (volume-wise) GeneX installation you are > > aware of, and what is the performance rating (in the admitted absence > > of a benchmark) for that? Do you have an idea at which volume > > GeneX performance appears to degrade significantly? Honestly, we've never had an opportunity to stress test the system. It wouldn't be that hard to do, but that bottleneck was an automatable data loader. Now that we have that, you could begin tossing random data into the tables and see where it breaks. We spent a fair amount of time considering how to keep the performance of GeneX high, but as friends at GeneLogic characterised it, 'GeneX is just another empty database' (compared GeneLogic's millions of arrays). > > 2) As for the open source model, what is your stance on prospectively > > joining the development? Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! Yes! There. Was that positive enough for you? As it stands, I run an independent contracting company, Open Informatics, which does contract development of GeneX. Todd Peterson at NCGR still works on GeneX as well. Harry Mangalam of tacg Informatics is also still (independently) involved with the project, but to a lesser degree. Jennifer Weller and Karen Slauch of the Virginia Bioinformatics Institute have also rejoined the project to a limited degree. In September, I migrated the CVS repositories over to SourceForge, and active development has been taking place there (except for the curation tool, which is still in CVS at NCGR) in order to support this very situation. We *really* want other developers to come help the project. > > (e.g., which of the alternatives a) active > > co-development of the HEAD, b) create your own branch and do whatever > > you like there, c) do whatever you like but you don't commit anything > > to the repository, comes closest?) I've already seen that someone from > > UCSD opened his own branch; do you remember what were the main reasons > > and what are the changes you are going to merge into the next release? > > How much is the schema written in stone? We opened up a branch for Michael Pear so that he could add his array manipulation architecture into the codebase. He wrote a very simple, very nice system using Embperl that enable authentication of users and session management, as well as an data normalization framework. Since Michael has stopped working on GeneX, it is still in the branch and not merged into the HEAD. If you give me your SF username I will add you as a developer to the project. At that point you may open up your own branch and hack away without talking to us at all. I'd be totally thrilled if you wanted to play an active role in the development of the HEAD. In that case, we should really discuss where things are moving. The biggest priority is to get genex-server-1.0.5 tested and released. That will include the beta of Michael's data loader code. After that, the big priority is to get a beta of Genex-2 out the door. This has a lot of schema changes from the Genex-1, series, and I want people to look it over and comment on it. I'll include a spreadsheet of what has been proposed for Genex-2 to the virginia consortium. Please take a look at it and voice any questions, concerns, priority shifts you'd like to make. BTW, Genex-2 is the HEAD, so if you run: cvs co genex-server off SourceForge's repository, you can take a look at the current state of the onion. > > 3) You require in the install tool some tools to be present and found > > which are not accessible whithout paying license fees for non-academic > > entities (like ours). That somewhat undermines the open-source nature > > of the whole deal. Sorry. I agree that the wording is very misleading. None of those are *required* they are merely suggested. That needs to be made much more clearer in the INSTALL document and in the output of install-all.pl. Genex will run perfectly fine without them (i.e. I have none of them installed). > > 4) There is no API in the database yet (views for retrieval, procedures > > and/or triggers for upload). Are you working on one, agnostic to having > > one, or reluctant to having one? Genex-2 has the beginings of that. I'm happy to get ideas/assistance. > > I'll post my questions regarding the schema itself separately. If you > > could point me to a document/page that describes the semantics behind > > the ERD in more detail, that would help. If you look in the DB/tdscripts/ directory, you'll see all of the (original) table definition scripts. They have a metric buttload of comments, which I have attempted to keep up to date even though the Genex-1 DB is no longer initialized using them (but that's another story)... There is a PDF document available on the WWW, but it is somewhat out of date. jas. |
|
From: <ja...@op...> - 2001-11-27 05:07:49
|
Hey Hilmar! Interesting to see that Novartis has taken an interest. "Hilmar Lapp" <la...@gn...> writes: > Quoting "Jason E. Stewart" <ja...@op...>: > > > There needs to be some significant > > database changes to make it work properly, and this will be released > > in the Genex-2 branch under development. > > May I ask how this will in general work, and what the required DB > changes were about? Mind you, this is for the upcoming Genex-2 version, not the code that is already available. Genex-2 was underway before the MAGE model was finalized. The primary change between Genex-1 and Genex-2 was a far more useful security model. In Genex-1 you can only protect ExperimentSets, ArrayMeasurements, and AM_Spots. All the rest is world viewable. Genex-2 enables you to protect *all* data: protocols, samples, contacts, etc. It also introduces audit information so you can track what was changed and by whom. And it introduces a generic authentication mechanism used by all CGI scripts -- so you have to login to the system before viewing data, making queries, manipulating data. Because MAGE is now (mostly) finalized, a good deal of the plans for Genex-2 will be the renaming of objects/tables to fit with MAGE nomenclature plus the addition of a number of additional tables/objects specified by MAGE. Genex-2 will *not* be fully MAGE compliant, but it will have major pieces. > Some background as to why I'm interested in the details: With my > previous employer we actually together with a consultant developed a > high-throughput general database loader, which would take any > record-oriented input file and load it to any relational > database. The limitation is obviously SQL on the DB end; i.e., > anything you cannot load through SQL cannot be loaded with that > tool. That limits you to a) insert into 1 table at a time, or b) > insert into 1 view at a time, provided you can attach insert > triggers to the view (which you can in Oracle), or c) call a stored > procedure. We used b) and c), with all the relational logic > (LU,PK,FK etc) staying within the DB. I'm actually trying to get > them to release the code (Java), not sure how successful this is > going to be. Sounds pretty cool. If you wanted to, that could easily be hosted at the MAGEstk site (mged.sf.net), the GeneX site, or at OpenInformatics (www.openinformatics.org). The Genex-2 data loader will *not* be a general purpose solution, it will strictly handle microarray data. You will need to specify two templates in order to use the loader: 1) the ArrayLayout (or ArrayDesign if you speak MAGE) 2) the QuantitiationDimension (from MAGE) that is defined by the combination of array technology and feature extraction software you used. This is a mapping that describes how many columns are in the output file, what their data type is and what the semantic meaning of the column is Once they are specified it just a matter of slurping in rows of data from the array files and entering them into the appropriate table in the DB. A major change in Genex-2 will be how the AM_Spots table is handle. In Genex-1 there is a single table into which all data is smashed. This works, and it is very general, but it creates too many problems. The solutiont that we've decided to pursue in Genex-2 is to use a different AM_Spots table for each new QuantitationDimension. That way if your data generates an array of 80 floats for each spot (or Feature in MAGE speak), all of those 80 numbers will go into a single row in the AM_Spots table for that technology. Genex-1 would force you to create 80 ArrayMeasurements each with a single value/spot in the AM_Spots table (yuck!). > > In the mean time, I took code > > that was graciously donated by Michael Pear, and got a data loader > > working for Genex-1. > > > > In the meantime, if you want to help pre-test the code, let me know. > > Sure. Especially if it helps me migrate a couple of thousand chip data > to our local GeneX in order to test its performance. You can check the code out from CVS. You'll want to use the 'Rel-1_0_1-branch' branch. Info on how to get the code from CVS is at: https://sourceforge.net/cvs/?group_id=16453 Once you've logged in you'll want to do the following: cvs -d:pserver:ano...@cv...:/cvsroot/genex \ co -r Rel-1_0_1-branch genex-server except of course you want it all on one line without the backslash... That will give you a working copy of GeneX-Server-1.0.5. The dataloader is in the affyloader/ directory. !!! WARNING !!! There isn't a huge amount of documentation available on the code. I've added a USAGE to each and a --help flag that *should* print out useful info, but YMMV. Please write to the list if you need help. You'll want to run a complete install even if you already have a working GeneX installation: there were two changes to the DB one to fix a bug in AL_Spots (the primary key was not being auto-generated), and the other is the addition of a view on the AM_Spots table. So you want to make sure that the DB installer runs and downloads the new DB init file (1.0.5) from the internet. BTW, GeneX has a nice feature for updating an existing installation. Check out the section on 'Updating an installation' in the INSTALL file. jas. |
|
From: Todd F. P. <tf...@nc...> - 2001-11-27 04:33:03
|
Dear HIlmar: Thank you for your interest in GeneX. The project has been semi-dormant for a while, but is now active again. I will respond to your questions after a bit of research. Problably will have answers tomorrow. Thank you for your patience. Todd Peterson NCGR On Mon, 26 Nov 2001, Hilmar Lapp wrote: > > Dear GeneX team, > > I'm evaluating using GeneX as our primary RNA profiling database, > which would mean a volume of several thousand Affy chips to be > served. I managed to install the system, and looked at the schema > some more. I've got several questions / things you may be able to > comment on. > > 1) What is the largest (volume-wise) GeneX installation you are > aware of, and what is the performance rating (in the admitted absence > of a benchmark) for that? Do you have an idea at which volume > GeneX performance appears to degrade significantly? > > 2) As for the open source model, what is your stance on prospectively > joining the development? (e.g., which of the alternatives a) active > co-development of the HEAD, b) create your own branch and do whatever > you like there, c) do whatever you like but you don't commit anything > to the repository, comes closest?) I've already seen that someone from > UCSD opened his own branch; do you remember what were the main reasons > and what are the changes you are going to merge into the next release? > How much is the schema written in stone? > > 3) You require in the install tool some tools to be present and found > which are not accessible whithout paying license fees for non-academic > entities (like ours). That somewhat undermines the open-source nature > of the whole deal. > > 4) There is no API in the database yet (views for retrieval, procedures > and/or triggers for upload). Are you working on one, agnostic to having > one, or reluctant to having one? > > I'll post my questions regarding the schema itself separately. If you > could point me to a document/page that describes the semantics behind > the ERD in more detail, that would help. > > Cheers, > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: la...@gn... > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > |
|
From: Todd F. P. <tf...@nc...> - 2001-11-27 04:27:54
|
We are going to work on this next week and create a release of a very simple loader. Todd Peterson NCGR On Mon, 26 Nov 2001, Hilmar Lapp wrote: > Hi all, > > I tried to find the DataLoader in the distribution, but was unable > to locate it. Quoting from the homepage: > > Mar 3, 2001 - DataLoader project started to support large-scale, server-side > data entry for GeneX. In Perl. Fewer annotation requirements. > > Did I overlook something, or has this been abandoned, or has this > finally turned into the XML loader? > > > -hilmar > -- > ------------------------------------------------------------- > Hilmar Lapp email: hil...@ya... > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev > |
|
From: Hilmar L. <la...@gn...> - 2001-11-27 01:35:53
|
Hi all,
I tried to find the DataLoader in the distribution, but was unable
to locate it. Quoting from the homepage:
Mar 3, 2001 - DataLoader project started to support large-scale, server-side
data entry for GeneX. In Perl. Fewer annotation requirements.
Did I overlook something, or has this been abandoned, or has this
finally turned into the XML loader?
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: hil...@ya...
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
|
|
From: Todd F. P. <tf...@nc...> - 2001-11-23 03:02:23
|
fyi...replied with 'required software' list from installation notes. please give your opinions. todd ---------- Forwarded message ---------- Date: Wed, 21 Nov 2001 09:05:47 -0600 From: "Spollen, William G." <spo...@he...> To: "'ge...@nc...'" <ge...@nc...> Subject: GeneX compatibility with our server Dear Sir or Madam, We are interested in using the GeneX server on our Compaq ES40 alpha server that runs Tru64 Unix vs. 5.1. It has terabyte storage. We want to use GeneX as the common tool for all the microarray experiments on this campus (we are expecting thousands of chips to be processed in the next 2 years). Almost every biologist linked to this server will be running win98 or win2K on their desktops. Please tell me if you know of any unusual problems we might have with GeneX given this distributed setup and server, and if you know of any other institutions that have used your system and that might have hardware similar to ours. Thank you, William G. Spollen Postdoctoral Research Fellow Dept. Health Management and Informatics University of Missouri Columbia MO 65211 |
|
From: Todd F. P. <tf...@nc...> - 2001-11-23 02:55:00
|
Dear William: The GeneX Server should port to most Unix flavors with little or no porting effort. I'm not completely sure about all of the supporting software required for our server (Apache, Postgres, etc....see the README file at http://genex.ncgr.org/genex/download/genex-server/00_README Here is an excerpt: Software -------- - Operating System: Unix. Preferably Linux. We have successfully install GeneX on Intel and PowerPC architectures, with RedHat 6.x and Debian 2.2 systems. It has also been successfully installed on Solaris. Other flavors of Unix should work out of the box with the following additions. - Utility software: All recent, full Linux distros include (and GeneX depends on): * gnu tar, * gnu text & file utils >= 2 (gnu sort < 2 doesn't sort exponents) * sendmail (or a sendmail replacement such as exim) * Perl >=5.005 * apache web server, configured to support Server-Side-Includes (see the apache documentation about this: <URL http://httpd.apache.org/docs/mod/mod_include.html>, esp. the tutorial: <URL http://httpd.apache.org/docs/howto/ssi.html> - Other Packages: Some distros will include these apps which are also needed for full functionality. If you cannot find them on your systam already with 'which', 'whereis' or 'locate', you'll have to add them yourself via RPM or dpkg. Most easily found at rpmfind.net or deb or tarball from the URLs below. * Postgres (7.x) => http://www.postgresql.org * R (>=1.1.1) => http://cran.r-project.org * ghostscript => http://www.cs.wisc.edu/~ghost * xgobi/xgobi => http://www.research.att.com/areas/stat/xgobi * vncserver => http://www.uk.research.att.com/vnc * mpage => http://rpmfind.net/linux/RPM/mpage.html * libexpat => http://sourceforge.net/projects/expat/ [Optional - only needed to recompile the jpython code] * jpythonc => http://www.jpython.org [optional] In order to produce the html versions of the DTD's you will also need a modified version of Earl Hood's excellent perlSGLML utilities that we distribute from: http://genex.ncgr.org/genex/download/genex-server/perlSGML.2001Jan23.tar.gz !!These should be installed BEFORE trying to install the NCGR components!! Todd Peterson Software Developer NCGR On Tue, 20 Nov 2001, Spollen, William G. wrote: > Dear sir, > We are interested in using GeneX on our Compaq ES40 alpha server > that runs Tru64 Unix vs 5.1. We want to use it as the common tool for all > the microarray experiments on this campus. If GeneX is not compatiable with > our system would you recommend another software package that is. > > Thank you, > > William G. Spollen > Postdoctoral Research Fellow > Dept. Health Mangement and Informatics > University of Missouri > Columbia > MO 65211 > > > |
|
From: Todd F. P. <tf...@nc...> - 2001-11-23 02:33:15
|
Dear Robert: This could be due to an empty <external_file_list/> entry. Search the xml file for this and delete it if it exists. Let us know if you still have problems. See http://www.geocrawler.com/archives/3/8983/2001/6/0/6064119/ and other archives at this location. Seems like the search function doesn't work on that site...so you have to manually look for messages with interesting subjects. Todd On Thu, 22 Nov 2001 rb...@mw... wrote: > > Dear genex team, > > I still have problems loading data into genex with xml2db.pl. It loads for > quite a while and then ends with the last messages beeing: > > parsing experiment_set elements... > No array data found! > > I created the data set with the curration tool V01.50 on windows using the > testdata (whatfield_ecoli_ihf) that was included with the installation. > > > I have another question: Is the data within the controlfiles that are > downloaded by the curration tool created by the genex server on the fly > from the database, or does one have to create that data in advance? If I > change data on the data base, are those alterations automatically seen in > the controlfiles? > > By the way, thank you very much for the quick and effective help on my > previous problems. > > With best regards Robert > > > Robert Bell > MWG Biotech AG > Anzinger Strasse 7 > D-85560 Ebersberg; Germany > email : rb...@mw... > voice : +49-8092-8289 356 > fax : +49-8092-8289 310 > > > > _______________________________________________ > Genex-dev mailing list > Gen...@li... > https://lists.sourceforge.net/lists/listinfo/genex-dev > |