[Gdpdm-devel] Re: GDPDM .sql corrections/updates

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Feb 3, 2006, at 4:43 AM, Terry Casstevens wrote:

> I have the latest version 2.0 DDL and documentation
> on the GDPDM web site (NOTE: this does NOT have
> the addition tables recently discussed).    Please take
> a look to make sure everything looks good to you.
> CVS on sourceforge is having problems, so I'll
> check this in later.   Let me know if you have
> any suggestions?
>
> http://www.maizegenetics.net/gdpdm/documentation.html

Hmm, this is odd.  The first thing I tried was to graph this using  
SQL::Translator and Graphviz, and SQLT choked on this:

CREATE TABLE `div_passport` (
   ...
   `accename` varchar(255) unique NOT NULL

That "unique" in there is odd!  MySQL dumped this?  I'm only used to  
see UNIQUE constraints declared like:

   UNIQUE(`accename`)

I also had to remove the data insertion statements (SQLT doesn't know  
about that syntax), but you don't really want to be distributing a  
schema with default info, right?  Make sure to run "mysqldump --no- 
data", I think.

All the relationships parse and draw well, so I think the FKs are good.

I'm a little bothered by some inconsistencies, like "div_passport"  
has an "accename" accession field, but "div_allele" has one called  
"accession."  I wish this idea was consistent across all tables.   
Also, "div_accession_collection" uses underscores and the abbr. "col"  
for "col_date" but not for "collnumb", "collsrc" and "collcode".  I  
see other places underscores aren't used to separate words, like  
"div_allele.referencedb", "div_locality.origcty",  
"div_passport.accenumb", and "div_passport.sampstat".

Should "div_passport.germplasm_type" be a separate lookup table to a  
"germplasm_type" table exporting a "germplasm_type_id"?  It bothers  
me that the table is called "passport" but that it has a  
"germplasm_type" -- it seems like either the table should be called  
"germplasm" or this field should be called "passport_type."  Either  
way, is this relationship denormalized?

I think "div_stock_parent.recurrent" is ambiguous and should be  
renamed "is_recurrent."  Perhaps it should also have a default value  
of either "0" or "1" -- is there some biological significance to  
declaring it needs to have a value?

I'd like to see the order of fields be PK, FKs, "name"-like field,  
other fields, "comments", so I'd like to see  
"div_allele_assay.div_ref_stock_id" moved up, perhaps after  
"div_scoring_tech_type_id"?

I think "cdv_marker.marker_aid" indicates an accession, right?  So it  
should probably be renamed to something more consistent with the rest  
of the schema, like "accename" or "accession."  (Embarrassingly, I  
think I lobbied for that field.  :-)

> I know you are still wanting the accession fields
> added to the schema.   Maybe we can discuss
> it at the Gramene retreat.   Both Ed and I are
> planning to attend.

I feel like there are already several accessions in there:

   div_passport.accename
   div_allele.accession +
   div_trait_uom.to_accession * +
   div_taxonomy.term_accession * +
   div_treatment_uom.eo_accession * +
   cdv_marker.marker_aid * +

   * - Really an external accession, but the same idea
   + - Should there be a UNIQUE constraint on these?

I think it's just a matter of "embracing and extending" this idea.  :-)

ky