You can subscribe to this list here.
| 2001 |
Jan
(135) |
Feb
(57) |
Mar
(84) |
Apr
(43) |
May
(77) |
Jun
(51) |
Jul
(21) |
Aug
(55) |
Sep
(37) |
Oct
(56) |
Nov
(75) |
Dec
(23) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(32) |
Feb
(174) |
Mar
(121) |
Apr
(70) |
May
(55) |
Jun
(20) |
Jul
(23) |
Aug
(15) |
Sep
(12) |
Oct
(58) |
Nov
(203) |
Dec
(90) |
| 2003 |
Jan
(37) |
Feb
(15) |
Mar
(14) |
Apr
(57) |
May
(7) |
Jun
(40) |
Jul
(36) |
Aug
(1) |
Sep
(56) |
Oct
(38) |
Nov
(105) |
Dec
(2) |
| 2004 |
Jan
|
Feb
(117) |
Mar
(69) |
Apr
(160) |
May
(165) |
Jun
(35) |
Jul
(7) |
Aug
(80) |
Sep
(47) |
Oct
(23) |
Nov
(8) |
Dec
(42) |
| 2005 |
Jan
(19) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2010 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Greg D. C. <gd...@nc...> - 2001-03-09 18:59:16
|
>Before anyone can add an array-layout into the DB the species for all >their USF must first exist in the DB. > >I have no problem removing species without SF, and especially those >that we never will, like Red Kangaroo. They were put there to help the >CT folks test out the organism table. Hmmm. I thought the point of the species table being rather thorough was to avoid having to add species all the time as new dtasets appeared. Also looks more professional. I though Carol vetted the list to be those species most likely to succeed? And, while we're talking about species, let's go the other direction towards more species. These species, which are not in our species table, have spots on the Stanford microarrays (they also have Homo sapiens, Escheria coli, and of course Arabidopsis thaliana): Streptomyces hygroscopicus Bacillus thuringiensis Aequorea victoria Photinus pyralis Can't load the data properly until these are in the species table and appear in the Control Bundle. In the short run I was thinking of taking an old Control Bundle and patching it by hand to include these new species, but I gotta fake the species id. Do we need to talk Jason? Greg |
|
From: Lonny M. <lx...@nc...> - 2001-03-09 18:57:32
|
It sounds reasonable to me. We can always add them as we get experiment sets. Lonny ----- Original Message ----- From: Harry J. Mangalam <hj...@nc...> To: <gen...@li...> Sent: Friday, March 09, 2001 12:42 PM Subject: Re: [GeneX-dev] Extra Species in the Database > > > On 9 Mar 2001, Jason E. Stewart wrote: > > > I'm not sure what you mean by 'that they don't work'. The annoyance > > that I'm aware of just has to do with the list being very long. So > > when there's a select list, you have to scroll forever. > > I meant that there's no experiments that match them so a user could be > clicking and getting the 'Sorry, no experiments matched your query' response > for a long time. We don;t have to remvoe all of them but we should remove > those that we'll not support for a long time (like purple wombat, red kanga, > blue shark, variagated plover, etc). > > > > What's the effort in removing the unsupported Species names in the > > > Db for now and adding them back only when there's SF support for > > > them? Or (he said, demonstrating his lack of attention to the SF > > > area) are they required for client-side-loading of SF data? > > > > Before anyone can add an array-layout into the DB the species for all > > their USF must first exist in the DB. > > OK - so we should keep things like Arab. thaliana, mouse, but remove some of > the more exotic species. > > > What say ye, others? > > hjm > > > > I have no problem removing species without SF, and especially those > > that we never will, like Red Kangaroo. They were put there to help the > > CT folks test out the organism table. > > > > jas. > > > > > > _______________________________________________ > > Genex-dev mailing list > > Gen...@li... > > http://lists.sourceforge.net/lists/listinfo/genex-dev > > > > > _______________________________________________ > Genex-dev mailing list > Gen...@li... > http://lists.sourceforge.net/lists/listinfo/genex-dev > |
|
From: Harry J. M. <hj...@nc...> - 2001-03-09 18:46:00
|
Sweet. What a relief. NO more embarassing 'cvs co analysis_scripts_install' I can hold my head up again at parties. The checkout went fine. hjm On 9 Mar 2001, Jason E. Stewart wrote: > Hey all, > > This probably only applies to Harry, Michael, and myself, but I'm > including the rest of you. > > The top level module that used to have the name > 'analysis_scripts_install' has been renamed to 'genex-server'. > > If you have that module checked out in either the main branch or the > genex2 branch you have two choices: > 1) throw away your existing local versions and do a cvs co > genex-server > 2) if you have work that you don't want to lose (or you just don't > want to wait for the lag-time on harwin) you can run the perl > script I'm including in this message from within the top level of > your local tree. > > The cvs-fix.pl script recursively descends from the CWD looking for > files named 'Repository' and when it finds one, it substitutes > 'analysis_scripts_install' in that file for 'genex-server'. After > running it, everything will now point to the correct location. > > Sorry for any inconvenience, > jas. > > |
|
From: <ja...@op...> - 2001-03-09 18:43:07
|
"Michael Pear" <mic...@ho...> writes: > I checked in the table changes we discussed to the genex2 branch. > -------------------------- > > * SequenceFeature.tablepg > Added plate_identifier,plate_row, plate_col There are two possible locations for this information. One is in the SF table, as you have put them, the other is in the AL_Spots table. The decision to have them one place or the other depends on how we wish to use the SF table. Here was my original desire: * The info in AL_Spots describes *exactly* what is on the array, cDNA, oligo, PCR product, etc. * the SF table would represent the higher level entity that the spot is attempting to measure, which corresponds to the 'Reporter' notion or our old CSF notion. This way entries in the AL_Spots table are specific to each layout (and thus would want plate, row, and column), and the SF table could be shared among different layouts. On the other hand, we could also put the information about what is in each spot in the SF table, the only drawback is that it would get pretty redundant with multiple layouts having to potentially re-enter the same information multiple times. That said, I don't see any drawbacks with either approach. Can you see any drawbacks with re-utilizing the SF entries with multiple layouts? If you can, we should use the approach you checked in. If you don't forsee any re-use issues, I'd like to move those columns into AL_Spots. jas. |
|
From: Harry J. M. <hj...@nc...> - 2001-03-09 18:41:59
|
On 9 Mar 2001, Jason E. Stewart wrote: > I'm not sure what you mean by 'that they don't work'. The annoyance > that I'm aware of just has to do with the list being very long. So > when there's a select list, you have to scroll forever. I meant that there's no experiments that match them so a user could be clicking and getting the 'Sorry, no experiments matched your query' response for a long time. We don;t have to remvoe all of them but we should remove those that we'll not support for a long time (like purple wombat, red kanga, blue shark, variagated plover, etc). > > What's the effort in removing the unsupported Species names in the > > Db for now and adding them back only when there's SF support for > > them? Or (he said, demonstrating his lack of attention to the SF > > area) are they required for client-side-loading of SF data? > > Before anyone can add an array-layout into the DB the species for all > their USF must first exist in the DB. OK - so we should keep things like Arab. thaliana, mouse, but remove some of the more exotic species. What say ye, others? hjm > I have no problem removing species without SF, and especially those > that we never will, like Red Kangaroo. They were put there to help the > CT folks test out the organism table. > > jas. > > > _______________________________________________ > Genex-dev mailing list > Gen...@li... > http://lists.sourceforge.net/lists/listinfo/genex-dev > |
|
From: <ja...@op...> - 2001-03-09 18:31:11
|
"Harry Mangalam" <man...@ho...> writes: > I can see that once people start to use the example DB, they'll be > clicking on all the extra species to try them out, which will lead > to all kinds of annoyance that they don't work. I'm not sure what you mean by 'that they don't work'. The annoyance that I'm aware of just has to do with the list being very long. So when there's a select list, you have to scroll forever. > What's the effort in removing the unsupported Species names in the > Db for now and adding them back only when there's SF support for > them? Or (he said, demonstrating his lack of attention to the SF > area) are they required for client-side-loading of SF data? Before anyone can add an array-layout into the DB the species for all their USF must first exist in the DB. I have no problem removing species without SF, and especially those that we never will, like Red Kangaroo. They were put there to help the CT folks test out the organism table. jas. |
|
From: <ja...@op...> - 2001-03-09 18:21:08
|
Hey all, This probably only applies to Harry, Michael, and myself, but I'm including the rest of you. The top level module that used to have the name 'analysis_scripts_install' has been renamed to 'genex-server'. If you have that module checked out in either the main branch or the genex2 branch you have two choices: 1) throw away your existing local versions and do a cvs co genex-server 2) if you have work that you don't want to lose (or you just don't want to wait for the lag-time on harwin) you can run the perl script I'm including in this message from within the top level of your local tree. The cvs-fix.pl script recursively descends from the CWD looking for files named 'Repository' and when it finds one, it substitutes 'analysis_scripts_install' in that file for 'genex-server'. After running it, everything will now point to the correct location. Sorry for any inconvenience, jas. |
|
From: Harry M. <man...@ho...> - 2001-03-09 18:20:34
|
Hi All, I can see that once people start to use the example DB, they'll be clicking on all the extra species to try them out, which will lead to all kinds of annoyance that they don't work. What's the effort in removing the unsupported Species names in the Db for now and adding them back only when there's SF support for them? Or (he said, demonstrating his lack of attention to the SF area) are they required for client-side-loading of SF data? -- Cheers, Harry Harry J Mangalam -- (949) 856 2847 (v&f) -- hj...@nc... || man...@ho... |
|
From: Harry M. <man...@ho...> - 2001-03-09 16:22:22
|
OK - FINALLY, Harry is going to change the name of the Server branch to genex-server (from the initial analysis_scripts_install). Please check in your code if needed. I plan on doing this at noon Pacific time. This only affects the harwin code currently. When genebox-> genex, I'll refresh harwin to the genex CVS as 'genex-server' as well. -- Cheers, Harry Harry J Mangalam -- (949) 856 2847 (v&f) -- hj...@nc... || man...@ho... |
|
From: Harry J. M. <hj...@nc...> - 2001-03-09 15:48:53
|
Yahooo! 1st code checkin of non-NCGR code! Most Excellent! hjm On Fri, 9 Mar 2001, Michael Pear wrote: > Jason, > I checked in the table changes we discussed to the genex2 branch. > -------------------------- > > * SequenceFeature.tablepg > Added plate_identifier,plate_row, plate_col > > * AL_Spots.tablepg > Added print_sequence,source_visit, and pin_identifier > > * Sample.tablepg > Added missing comma > > Regards, > > Michael Pear > > > _______________________________________________ > Genex-dev mailing list > Gen...@li... > http://lists.sourceforge.net/lists/listinfo/genex-dev > |
|
From: Michael P. <mic...@ho...> - 2001-03-09 15:43:06
|
Jason,
I checked in the table changes we discussed to the genex2 branch.
--------------------------
* SequenceFeature.tablepg
Added plate_identifier,plate_row, plate_col
* AL_Spots.tablepg
Added print_sequence,source_visit, and pin_identifier
* Sample.tablepg
Added missing comma
Regards,
Michael Pear
|
|
From: <ja...@op...> - 2001-03-09 04:18:23
|
Hey All, This is the report from the OMG meeting that just happen in Irvine CA. It was a surprisingly productive meeting. I still feel that the OpenSource process: release early, release often is a better approach than the OMG process. But since many corporate players feel very uncomfortable with that approach, I guess the OMG process is as good as you get. Executive Summary ================= * we were extremely productive, and were able to agree on big pieces of the data format. These pieces had to do with sample tracking, and higher level annotations. * we spent most of the first 1/3 of the time discussing the glossary and agreeing on nomenclature. This was a GoodThing (TM). We now have a fairly broad group of academics (MGED) and three major corporate players (Rosetta, NetGenix, and Agilent) agreeing on terminology. It is my assigned duty to write up a new glossary, and when I do I will distribute it to this list. * The big discussion came to the actual data encoding for the spot values. The two major options were Rosetta's very verbose but explicit row-by-row XML encoding (much like GeneXML) or MAML's very loose, flexible and un-tested matrix approach. Paul and I agreed to implement some examples of the matrix approach before we would decide anything, but it seemed likely that support for the row-based approach would be manditory, and the matrix approach optional. * We decided that a description of spots on an ArrayLayout (which we agreed to call an ArrayPattern) needed to include at least three levels: 1) the SequenceFeature -- the nucleic acid (or other stuff) bound to the array; 2) the Reporter the higher level entity that SF was to represent (we used to call this the *canonical* sequence feature; 3) the BiosequenceCluster -- for example a UniGene set. * We uncovered that Rosetta's and NetGenix submission's had missed the concept of between array comparisons (i.e. ratio's of two different time points). Michael Miller was going to address this. * Rosetta had an elegant way to encode spot position which did not require massive numbers of attributes * There was an enormous amount of technical discussion about what would be the *normative* output of this groups effort, i.e. what is the actual document that will hold the specification. Should it be UML, a CORBA IDL, or XML? We decided to absolutely produce UML, and that the UML can be used to produce an IDL and an XML specification, but that we are likely to produce both the IDL and XML specification as well. So, all in all, we learned a lot, and it was worth the trip. jas. |
|
From: <ja...@op...> - 2001-03-08 03:40:21
|
"Greg D. Colello" <gd...@nc...> writes: > The Curation Tool cannot do the first step, because we all agreed many moons ago > not to send the existing USF info over as part of the Control Bundle, due to the > fear of update implications. The problem is, what if the USF already exists, but > some of the other USF related information has changed? That requires a > reconciliation and possibly an update process. Upon initiating the load process, the user indicates either to re-use existing USF information or to load it new from the layout. If that loads redundant SF info, then sobeit. > >* a unique identifier for each SF (and we need to add > > unique_identifier to the SF_Type controlled vocab). > > You mean like a USF_ID number? Analogous to the SPOT_ID number concept? > > If so, is the point of this to allow referencing of USF's in the Array Layout by > USF_ID number? it is just an additional term for the sf_type controlled vocab. If the sf_name has no biological relevance, it's just an assigned name to make it unique on the array, we call it a unique_identifier rather than a gene_name or serial_orf_name, which both mean somethin different. jas. |
|
From: Greg D. C. <gd...@nc...> - 2001-03-08 00:46:24
|
>Delivered-To: fix...@li...@fixme >To: gen...@li... >Mime-Version: 1.0 (generated by tm-edit 1.5) >From: ja...@op... (Jason E. Stewart) >Subject: [GeneX-dev] Conf call summary >Date: 06 Mar 2001 23:11:13 -0700 > >Michael, Harry, and I had a call today about the status of the data >loader. > > >Loader >====== >Next, we identified the pieces which are critical for the entry of >ArrayLayouts: > >* Assuming that the SequenceFeature information is not present already > in the DB, the SF info will be taken from the layout file. >* the spot type for each layout spot (blank, sequence_feature, control) >* the species to which all the SF belong, or a listing of species on > a per-SF basis. Interesting. This is all in agreement with the specs I was going to detail in my Word document. The Curation Tool cannot do the first step, because we all agreed many moons ago not to send the existing USF info over as part of the Control Bundle, due to the fear of update implications. The problem is, what if the USF already exists, but some of the other USF related information has changed? That requires a reconciliation and possibly an update process. >* a unique identifier for each SF (and we need to add > unique_identifier to the SF_Type controlled vocab). You mean like a USF_ID number? Analogous to the SPOT_ID number concept? If so, is the point of this to allow referencing of USF's in the Array Layout by USF_ID number? >* a unique identifier for each layout spot Yep. A real good idea. Makes a great reference ID for the Array Measurement files. Greg |
|
From: <ja...@op...> - 2001-03-07 18:15:16
|
Hey, So the good news is, you should have a completely working installation, but for some reason dtd2html is crapping-out. All this means is that you might not be able to view the the html-ized dtd files. The rest should all work properly. > dtd2html > Scalar found where operator expected at /usr/local/lib/perl/dtd.pl line > 1475, within pattern > (Missing operator before ?) > Scalar found where operator expected at /usr/local/lib/perl/dtd.pl line > 1475, within pattern > (Missing operator before ?) > syntax error at /usr/local/lib/perl/dtd.pl line 1475, near "$opt$plus" It seems you have a different version of dtd.pl ... In my version there is nothing at line 1475, and the only occurrence of "$opt$plus" is in extract_elem_names() at line 1522. Did you install the version from the tarball on sourceforge, i.e. perlSGML.2001Jan23.tar.gz? If so what version of perl are you using (silly me for not storing that info in Options.reminders... > > Did the dir: '/usr/local/genex/lib/dtd/' get made OK and is set to be writable? > > It seems that it made ok & it's writable only to root though. Do I need to > change it be group writeable? > > hammerhead@hammerhead:/$ ls -ld /usr/local/genex/lib/dtd > drwxr-sr-x 3 root staff 4096 Feb 26 17:30 > /usr/local/genex/lib/dtd > hammerhead@hammerhead:/$ ls -l /usr/local/genex/lib/dtd > total 56 > -rw-r--r-- 1 root staff 1772 Feb 26 17:30 als.dtd > -rw-r--r-- 1 root staff 2024 Feb 26 17:30 ams.dtd > -rw-r--r-- 1 root staff 1633 Feb 26 17:30 csf.dtd > lrwxrwxrwx 1 root staff 24 Feb 26 17:30 dtd -> > /usr/local/genex/lib/dtd > drwxr-sr-x 2 root staff 4096 Feb 26 16:21 genexml-html > -rw-r--r-- 1 root staff 34788 Feb 26 17:30 genexml.dtd > -rw-r--r-- 1 root staff 3199 Feb 26 17:30 usf.dtd check inside the genexml-html directory for a lot of html files. If they exist, then perhaps it really did work after all. jas. |
|
From: <ja...@op...> - 2001-03-07 06:08:28
|
Michael, Harry, and I had a call today about the status of the data loader. Security ======== Michael had a good suggestion as to how to fix the three way circle of fkeys between UserSec --> Contact --> Security --> UserSec ... I'm including the jpg of his ER diagram. The solution involves normalizing the Contact table to remove the redundant information into a separate table, Source (perhaps a better name??), which holds the security info, and creating a view that unifies Source and Contact. That way no circle of fkeys, no redundant user info in Contact (previously, a users could have all their contact info repeated for each entry -- login, experiment_provider, sample_provider, etc.). Now they just have multiple source entries, with one set of Contact info. We decided not to go with using the Postgres user-level security for now, as it would involve big changes to late in the game. Thank you Michael, thank you OpenSource ... Loader ====== Next, we identified the pieces which are critical for the entry of ArrayLayouts: * Assuming that the SequenceFeature information is not present already in the DB, the SF info will be taken from the layout file. * the spot type for each layout spot (blank, sequence_feature, control) * the species to which all the SF belong, or a listing of species on a per-SF basis. * a unique identifier for each SF (and we need to add unique_identifier to the SF_Type controlled vocab). * a unique identifier for each layout spot Also, new (optional) additions for sample and quality tracking: - The plate where the spotted stuff came from: * plate identifier (a barcode) * plate row * plate column - The order in which the spotting mechanism laid down the spots: * source visit * print sequence * pin Assignments =========== Jason: - list all the available information slots for layout/layout-spots. This involves a combination of SequenceFeature, ArrayLayout, Sample, and AL_Spots. - load new tdscripts into genex2 DB - re-write the scripts to load Contact and UserSec info - load test Contact, UserSec, GroupSec, GroupLink, and Security into genex2 DB - load Species, Spotter, Scanner, and Software info into genex2 DB - load CV into genex2 DB - write first loader for simple yeast file Michael: - Modify embperl script to use new Security table to retrieve all ExperimentSet's visible by a given user - get CVS working Harry: - Modify existing scripts to use genex2 DB Harry and Michael, please fill in what I've forgotten. jas. |
|
From: Greg D. C. <gd...@nc...> - 2001-03-06 18:13:20
|
To all: Harry suggested I capture my thoughts on this subject in an email and distribute for comment. This is only installment 1 on this subject. It discusses the problems. Installment 2 will be my proposed solution, a simple set of GeneX file format standards as a Word document. This will be a draft document for comment. It will take me a few days to propduce it. -------------------- I am currently working on the Curation Tool to guarentee support for replicate spots. As an example set of import files with replicate spots, I have been looking at Shauna Sommerville's Arabidopsis dataset that I just received. In the process I am encountering all sorts of other problems with their files. ALthough the Curation Tool was designed to be flexible enough to accept any tab-delimited column format, supporting the kind of illogical formats I'm seeing is way too much programming effort within the Curation Tool. I am now inclined towards setting some GeneX file format standards that may require the end users to reformat their files somewhat. To understand what I have in mind requires a little background discussion: The GeneX database has a logical design which implies the following three import file types: 1. User Sequence Features (USF file) - This should be like a library of all the unique sequence features (and miscellaneous info about them) that could be spotted on a lab's arrays or more typically on just one array. Thus this file should have just as many rows as there are unique sequence features. A unique sequence feature is currently defined as a unique combination of a sequence feature name; a type (like gene, orf, clone, est); and a species. I will call this combination USFN:TYPE:SPECIES for short. Note that a USFN column can have a default TYPE, and the entire file can have a default SPECIES. 2. Array Layout Spots (ALS file) - This should be a list of all the spots that appear on a given array. If a spot contains a sequence feature that can be found in the USF File, it MUST be resolvable to a USFN:TYPE:SPECIES key; where, like above, the TYPE and SPECIES can be defaulted. Each spot MUST have a SPOTTYPE (blank, control, or USFN). Note, this SPOTTYPE is currently deficient in that there is no way for the user to indicate the intention of a control. Each spot can have optional info like a SPOTID, which is a highly desirable unique key to a given spot. Other info like a spot description (for control spots) and spot coordinates may be specified. 3. Array Measurement Spots (AMS File) - This should be a file or series of files containing all the actual raw spot values as evaluated by some array image analysis routine, possibly accompanied by some other values derived from the raw values. Often there will be statistics columns summarizing the pixel distributions for each spot. Each row of this file MUST be keyed in one of three ways to the ALS File: (1) a USFN:TYPE:SPECIES key, (2) a SPOTID key, (3) an identical row order to the ALS File. This is the way we imagined things to work in a perfect world. Here's the problems: 1. USF File - It seems like labs find it very difficult to produce a library of unique names. The files they send us contain imperfect naming columns. There is always some slop they "don't worry about". It would be a simpler world if all spots had corresponding names in the USF File. But blanks are problematic. They aren't sequences, and they don't have a TYPE or a SPECIES. Blanks are often caused, because there was nothing in a well in a microtiter plate, and it was easier to let the robot spot as if there was something in the well. Control spots are also problematic. Are these sequence features? Well usually, but what if they are random sequences? Then a lab would end up with potentially thousands of control sequences listed in the USF File. Sometimes controls are not sequences. They are for example fluor or media only. That's why blanks and control spots are specified in the ALS File. In Shauna's lab the naming problems are significant. They do have clone names, but not all spots are clones. Some are control DNA. Some of their clones and controls come from species other than Arabidopsis. There is no File column indicating the species of each spot. Some of the spots have gene names also. These are in a different column. Sometimes there is more than one gene name for a spot. Some genes have no names, just a description, as if that were a good substitute. Each of these anomalies cause significant problems for the Curation Tool. 2. ALS File - Almost no lab has files which clearly spell out the distinction between blanks, controls, and USF's. Often SPOTID's exist, but they don't necessarily appear in the AMS Files. Finally array spot coordinates are specified in a myriad of ways which don't always map well to our database slots for them. In Shauna's lab a spot is either a clone, a blank, or a control sequence (which is "known" DNA of some sort, whose purpose is often lost on anyone except the array creator, who often isn't available for comment). If it's a clone or a control there may or may not be any other information about the sequence in the USF File. Usually not. Just some clone name and a plate-well coordinate. The blank and control names are in different file columns than the clone names. There is no column indicating the type of the spot. 3. AMS File - If spot replicates, controls, or blanks exist; then the USFN:TYPE:SPECIES key system is useless for clearly referencing a given AMS back to a unique ALS. I leave this as an exercise for the reader or trust me. If the user attempts to use the USFN:TYPE:SPECIES key system in the AMS File under any of these conditions, it must be detected and declared an error. Usage of SPOTID's is ideal for clearly referencing a given AMS back to a unique ALS, but almost nobody does this. Actually more often than not, the lab user's key of choice appears to be an identical row order to the ALS File. In my opinion this an extremely dangerous practice that should be discouraged, especially if the lab intends to load their data into a database. Finally, these files often contain image analysis statistics columns that the lab users might want to retain and use as part of their analyses. The GeneX team opted to handle these columns as the single measurement type called "other_derived". I doubt this is sufficiently distinct to satisfy the Stanford folks, who seem determined to use this information (as evidence, I take note of the draft gene expression database tables from Carnegie for the TAIR project, where every ScanAlyze or GenePix statistic has a storage slot). As a result of the above problems, the Curation Tool often finds itself confronted with a set of files containing some kind of hidden illogical issue. It is nearly impossible to cheaply program a generic tool around these kinds of issues. I suspect the DataLoader will encounter similar issues. Ok. So what are my recommendations? Well first of all I thunk we done good in breaking things apart into three logical divisions (USF, ALS, AMS). I think the real solution for us is to force GeneX users to follow simple file standards for these three file types. If they don't, then we don't support them. I will describe a draft standard in a subsequent email using a Word file attachment. In this way I will be producing documentation at the same time as we discuss this issue. Since the result of such a standard will be to put requirements on the users to possibly restructure their file formats, I think this is a group decision (not mine). Greg Colello |
|
From: Michael P. <mic...@ho...> - 2001-03-06 15:41:24
|
Jason, I have attached an archive with updated tables and a schema diagram to resolve the circular reference problem. The archive contains an updated table definition for Contact and UserSec and a new table Source (I'm open to name change!). The US_Contactlink table can be removed. ContactShema.jpg is a ER diagram that shows the relationship between Contact, UserSec, Security, and Source. Do you think it more appropriate to use "Supplier" or "Provider" instead of source? The assumption here is that a user with a login automatically has the right to update their contact information. If a contact is listed as a source for something, without a login, then rights to change the contact information are bestowed through that table. Note that this is designed to correct what I saw as another issue...that a contact's info needed to be repeated for each type of source (e.g., technology_provider, data_provider, etc.) This sheme allows sharing of the contact info through a many to one relationship. This will have some ripple effects, but we can minimize them by creating a view that joins the Source and Contact tables to allow lookup in the same way as was done before. Info update, though, will require dealing with the two tables. I've included a file "SourceContact.viewpg" which defines such a view. Regards, Michael Pear ----- Original Message ----- From: "Michael Pear" <mic...@ho...> To: <gen...@li...> Sent: Monday, March 05, 2001 4:53 PM Subject: Re: [GeneX-dev] DB Security (was Re: Progress on Loader) > Hi Jason, > > We probabably should briefly summarize that this discussion is to > resolve a circular reference which is introduced by simply adding > the new Security table, So that Contact -> Security -> UserSec -> Contact, > etc. > > After wrestling with this this afternoon, I do think that approach #2 is the > best direction right now. > In looking at the data model, it really appears to me that the proper way > to address the cicular nature is the subtyping approach. If you really look > at the intention > behind the data model, there are two types of "contacts" in the contact > table. > One is a "source" contact which includes vendors, people providing data, > organisms, > etc. > The other is an "owner" contact who can actually own rights to data in a > table > and who can log into the system via info in UserSec. "Source" contacts could > be further subtyped, but we can stay away from that for now. The security > policy would be that "source" contacts have a security entry foreign key, > but "owner" contacts don't, but rather can own a security entry. This will > resolve the circular reference. > > It can be implemented with reference constraints, and one additional table. > Let me offer to work up mods > to the tables involved and send them to you to take a look at. I'll plan on > doing it without the "inherits" > ability that is a Postgresql extension. So give me until early tomorrow, and > think about something else, > ok? > > I do suggest staying away from Approach #3 at this time. It is appealing on > one level, but it > represents a major change in the security design, and I fear it will have > far reaching consequences > that will derail us for getting the data loading working. > > Regards, > > Michael Pear > > > > > > Approach #1: Use the horrible linking table US_ContactLink > > > > --- > > > > you can tell I don't like this one > > > > > > > > Approach #2: Use subtyping on the Contact table > > > > --- > > > > I appreciate the link to the SQL article about circular references, > > > > but I found the information to be on the brief side, I need more > > > > details. > > > > > > > > My guess is there would be 3 contact tables: > > > > > > > > SuperTable: > > > > table contact ( > > > > // all the non-fkey attributes > > > > ); > > > > > > > > This will not actually have any data, the data will go into the two > > > > subtype tables: > > > > > > > > table contact_login ( > > > > us_fk int4 references usersec(us_pk) > > > > ) inherits(contact); > > > > > > > > table contact_general ( > > > > sec_fk int4 references security(sec_pk) > > > > ) inherits(contact); > > > > > > > > Approach #3: get rid of UserSec entirely > > > > --- > > > > I stumbled upon this thought this morning. The idea would be to use > > > > the pg_shadow table, and give every user a 'real' Pg account, and use > > > > all the exist login code built into Pg. > > > > > > > > I think this approach works best of all. It just means maintiain a lot > > > > of accounts as opposed to a bunch of entries in the UserSec table. > > > > > > > > What do you think? > > > > jas. > > > > > > > > > > _______________________________________________ > > Genex-dev mailing list > > Gen...@li... > > http://lists.sourceforge.net/lists/listinfo/genex-dev > > > _______________________________________________ > Genex-dev mailing list > Gen...@li... > http://lists.sourceforge.net/lists/listinfo/genex-dev |
|
From: Michael P. <mic...@ho...> - 2001-03-06 00:53:15
|
Hi Jason, We probabably should briefly summarize that this discussion is to resolve a circular reference which is introduced by simply adding the new Security table, So that Contact -> Security -> UserSec -> Contact, etc. After wrestling with this this afternoon, I do think that approach #2 is the best direction right now. In looking at the data model, it really appears to me that the proper way to address the cicular nature is the subtyping approach. If you really look at the intention behind the data model, there are two types of "contacts" in the contact table. One is a "source" contact which includes vendors, people providing data, organisms, etc. The other is an "owner" contact who can actually own rights to data in a table and who can log into the system via info in UserSec. "Source" contacts could be further subtyped, but we can stay away from that for now. The security policy would be that "source" contacts have a security entry foreign key, but "owner" contacts don't, but rather can own a security entry. This will resolve the circular reference. It can be implemented with reference constraints, and one additional table. Let me offer to work up mods to the tables involved and send them to you to take a look at. I'll plan on doing it without the "inherits" ability that is a Postgresql extension. So give me until early tomorrow, and think about something else, ok? I do suggest staying away from Approach #3 at this time. It is appealing on one level, but it represents a major change in the security design, and I fear it will have far reaching consequences that will derail us for getting the data loading working. Regards, Michael Pear > > Approach #1: Use the horrible linking table US_ContactLink > > > --- > > > you can tell I don't like this one > > > > > > Approach #2: Use subtyping on the Contact table > > > --- > > > I appreciate the link to the SQL article about circular references, > > > but I found the information to be on the brief side, I need more > > > details. > > > > > > My guess is there would be 3 contact tables: > > > > > > SuperTable: > > > table contact ( > > > // all the non-fkey attributes > > > ); > > > > > > This will not actually have any data, the data will go into the two > > > subtype tables: > > > > > > table contact_login ( > > > us_fk int4 references usersec(us_pk) > > > ) inherits(contact); > > > > > > table contact_general ( > > > sec_fk int4 references security(sec_pk) > > > ) inherits(contact); > > > > > > Approach #3: get rid of UserSec entirely > > > --- > > > I stumbled upon this thought this morning. The idea would be to use > > > the pg_shadow table, and give every user a 'real' Pg account, and use > > > all the exist login code built into Pg. > > > > > > I think this approach works best of all. It just means maintiain a lot > > > of accounts as opposed to a bunch of entries in the UserSec table. > > > > > > What do you think? > > > jas. > > > > > > _______________________________________________ > Genex-dev mailing list > Gen...@li... > http://lists.sourceforge.net/lists/listinfo/genex-dev |
|
From: <ja...@op...> - 2001-03-05 23:58:03
|
Jason's dumb. |
|
From: <ja...@op...> - 2001-03-05 23:57:07
|
Uh, how about that attachment? jas. |
|
From: <ja...@op...> - 2001-03-05 23:56:46
|
"Michael Pear" <mic...@ho...> writes: > Can you send me the table definition for your security info, > contact, and > usersec tables > as they stand in your new schema. They may be checked in, but I don't yet > know where/how > to get to the cvs tree. I've found some more in the data modeling book I > mentioned > (Data Modeling Handbook by Reingruber and Gregory) regarding "triads" which > is the > initial problem, so I might be able to suggest another alternative. Here is the current state of all the DB changes for the devo line. It is a tarball of Postgres table definition scripts that extracts into it's own tdscripts/ directory. jas. |
|
From: <ja...@op...> - 2001-03-05 21:43:14
|
Sorry to drop people into the middle of a discussion, but Harry pointed out that I hadn't included the genex-dev list. <caveat> The work in progress is on the devo branch of the Server in CVS. None of the issues I'm discussing affect the 1.0 release of the DB, Server, or Curation Tool. </caveat> With that said, let's proceed with the technical details ... "Harry J. Mangalam" <hj...@nc...> writes: > 1) Your #3 would be the best approach for the current time, but it locks the > schema to Postgres, doesn't it? Maintaining a separate USerSec table keeps > the Data Model independent of the underlying DB, no? HOwever, even if so, > do we care at this point? Well, probably not AT THIS POINT, bu twhat about > later when there's someone who ABSOLUTELY REQUIRES it on Oracle? Maybe it locks it to postgres, but probably not. Ever DBMS has to have a user table that supports passwords, so all that will change is the name of the table and the names of the columns. That's likely to be pretty minor. One advantage is a lot of the security will be handled directly by DBI. When the user logs in, DBI will user their username/password directly in the DBI->connect() call. This issue I haven't worked out is how much of a slowdown the row level security will cause on tables like SequenceFeature. > 2) These threads should be posted to the SF dev list, shouldn;t > they? Ok. you asked for it ... > On 5 Mar 2001, Jason E. Stewart wrote: > > > "Michael Pear" <mic...@ho...> writes: > > > > > Tomorrow would be fine. Other than preliminary > > > test of ParseExcel and WriteExcel, and assembling > > > data files, I've not done anything either. Jason, > > > I'm looking for you to suggest how I can best help > > > you. > > > > Hi Michael, > > > > I'm still wrestling with the schema a bit and could use your help. The > > issue is the security problem. What to do with UserSec > > > > Approach #1: Use the horrible linking table US_ContactLink > > --- > > you can tell I don't like this one > > > > Approach #2: Use subtyping on the Contact table > > --- > > I appreciate the link to the SQL article about circular references, > > but I found the information to be on the brief side, I need more > > details. > > > > My guess is there would be 3 contact tables: > > > > SuperTable: > > table contact ( > > // all the non-fkey attributes > > ); > > > > This will not actually have any data, the data will go into the two > > subtype tables: > > > > table contact_login ( > > us_fk int4 references usersec(us_pk) > > ) inherits(contact); > > > > table contact_general ( > > sec_fk int4 references security(sec_pk) > > ) inherits(contact); > > > > Approach #3: get rid of UserSec entirely > > --- > > I stumbled upon this thought this morning. The idea would be to use > > the pg_shadow table, and give every user a 'real' Pg account, and use > > all the exist login code built into Pg. > > > > I think this approach works best of all. It just means maintiain a lot > > of accounts as opposed to a bunch of entries in the UserSec table. > > > > What do you think? > > jas. > > |
|
From: <ja...@op...> - 2001-03-05 20:21:19
|
"Todd Peterson" <tf...@nc...> writes: > Having a separate table for each control vocabulary is extremely > tedious. I propose the following table structure: > > table: > ControlVocabulary > attributes: > vocabulary_name varchar(80) NOT NULL > term_string varchar(48) NOT NULL > description text > > Advantages: less tables which all have the same structure. less > maintenance. generic. code simplification. > > Disadvantages: code changes in DB2XML, XML2DB, control bundle > generator. GeneXML should NOT change. Here's what I'm proposing: CREATE TABLE ControlledVocab ( cv_pk serial PRIMARY KEY, last_updated datetime NOT NULL, --when the row was last modified last_updated_user name NOT NULL, --who last modified the row sec_fk int4 NOT NULL REFERENCES Security(sec_pk), --specifies all access and update permissions for the data vocab_name varchar(128), --the combination of table_name and column_name used to --describe the vocab. Explicitly stating this helps map --to the XML definition file, as well as identifying --all terms from from the same vocab table_name varchar(128), --the DB table this term describes column_name varchar(128), --the exact column DB table this term describes term_name varchar(128), description varchar(128) ) The last_updated, last_updated_user, sec_fk are things that almost all tables have gotten. I'm not sure about using a primary key, so I'd like feedback. Having vocab_name is important, having table_name and column_name are not strictly necessary, but I think its nice. jas. |
|
From: <ja...@op...> - 2001-03-05 19:41:42
|
Hey All,
Since I've included a last_updated and last_updated_user column in
most tables in the DB, I've investigated the use of triggers to
automatically set those columns on INSERT's and UPDATE's.
Turns out there is an example of how to do this in the Pg User Guide,
and it works quite nicely:
CREATE FUNCTION stamp () RETURNS OPAQUE AS '
BEGIN
NEW.last_updated := ''now'';
NEW.last_user := getpgusername();
RETURN NEW;
END;
' LANGUAGE 'plpgsql';
CREATE TRIGGER tst BEFORE INSERT OR UPDATE ON tst
FOR EACH ROW EXECUTE PROCEDURE stamp();
jas.
|