[Gdpdm-devel] Re: GDPDM .sql corrections/updates
Brought to you by:
tcasstevens
From: Ken Youens-C. <ky...@gm...> - 2006-02-06 20:37:40
|
On Feb 3, 2006, at 4:43 AM, Terry Casstevens wrote: > I have the latest version 2.0 DDL and documentation > on the GDPDM web site (NOTE: this does NOT have > the addition tables recently discussed). Please take > a look to make sure everything looks good to you. > CVS on sourceforge is having problems, so I'll > check this in later. Let me know if you have > any suggestions? > > http://www.maizegenetics.net/gdpdm/documentation.html Hmm, this is odd. The first thing I tried was to graph this using SQL::Translator and Graphviz, and SQLT choked on this: CREATE TABLE `div_passport` ( ... `accename` varchar(255) unique NOT NULL That "unique" in there is odd! MySQL dumped this? I'm only used to see UNIQUE constraints declared like: UNIQUE(`accename`) I also had to remove the data insertion statements (SQLT doesn't know about that syntax), but you don't really want to be distributing a schema with default info, right? Make sure to run "mysqldump --no- data", I think. All the relationships parse and draw well, so I think the FKs are good. I'm a little bothered by some inconsistencies, like "div_passport" has an "accename" accession field, but "div_allele" has one called "accession." I wish this idea was consistent across all tables. Also, "div_accession_collection" uses underscores and the abbr. "col" for "col_date" but not for "collnumb", "collsrc" and "collcode". I see other places underscores aren't used to separate words, like "div_allele.referencedb", "div_locality.origcty", "div_passport.accenumb", and "div_passport.sampstat". Should "div_passport.germplasm_type" be a separate lookup table to a "germplasm_type" table exporting a "germplasm_type_id"? It bothers me that the table is called "passport" but that it has a "germplasm_type" -- it seems like either the table should be called "germplasm" or this field should be called "passport_type." Either way, is this relationship denormalized? I think "div_stock_parent.recurrent" is ambiguous and should be renamed "is_recurrent." Perhaps it should also have a default value of either "0" or "1" -- is there some biological significance to declaring it needs to have a value? I'd like to see the order of fields be PK, FKs, "name"-like field, other fields, "comments", so I'd like to see "div_allele_assay.div_ref_stock_id" moved up, perhaps after "div_scoring_tech_type_id"? I think "cdv_marker.marker_aid" indicates an accession, right? So it should probably be renamed to something more consistent with the rest of the schema, like "accename" or "accession." (Embarrassingly, I think I lobbied for that field. :-) > I know you are still wanting the accession fields > added to the schema. Maybe we can discuss > it at the Gramene retreat. Both Ed and I are > planning to attend. I feel like there are already several accessions in there: div_passport.accename div_allele.accession + div_trait_uom.to_accession * + div_taxonomy.term_accession * + div_treatment_uom.eo_accession * + cdv_marker.marker_aid * + * - Really an external accession, but the same idea + - Should there be a UNIQUE constraint on these? I think it's just a matter of "embracing and extending" this idea. :-) ky |