[Gdpdm-devel] Re: GDPDM .sql corrections/updates

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Ken,

Thanks for reviewing the DDL.
Please see my embedded comments below.

Cheers,

Terry

On 2/6/06, Ken Youens-Clark <ky...@gm...> wrote:
> On Feb 3, 2006, at 4:43 AM, Terry Casstevens wrote:
>
> > I have the latest version 2.0 DDL and documentation
> > on the GDPDM web site (NOTE: this does NOT have
> > the addition tables recently discussed).    Please take
> > a look to make sure everything looks good to you.
> > CVS on sourceforge is having problems, so I'll
> > check this in later.   Let me know if you have
> > any suggestions?
> >
> > http://www.maizegenetics.net/gdpdm/documentation.html
>
> Hmm, this is odd.  The first thing I tried was to graph this using
> SQL::Translator and Graphviz, and SQLT choked on this:
>
> CREATE TABLE `div_passport` (
>    ...
>    `accename` varchar(255) unique NOT NULL
>
> That "unique" in there is odd!  MySQL dumped this?  I'm only used to
> see UNIQUE constraints declared like:
>
>    UNIQUE(`accename`)
>

No, mysqldump did not produce that.   I put that in.   I wasn't
sure how it should be, but it executed correctly when I built
a new database?   I will change it if you think I should?

>
> I also had to remove the data insertion statements (SQLT doesn't know
> about that syntax), but you don't really want to be distributing a
> schema with default info, right?  Make sure to run "mysqldump --no-
> data", I think.
>

I did use the --no-data option when I first generated the script.
I put those insert statements in afterward.   If you look at the
tables and values that are being inserted, you'll see they are very
generic.   And from my understanding, needed for pretty much
any GDPDM database.   So I meant to make it easier for
folks.   Do you still disagree?

>
> All the relationships parse and draw well, so I think the FKs are good.
>
> I'm a little bothered by some inconsistencies, like "div_passport"
> has an "accename" accession field, but "div_allele" has one called
> "accession."  I wish this idea was consistent across all tables.
> Also, "div_accession_collection" uses underscores and the abbr. "col"
> for "col_date" but not for "collnumb", "collsrc" and "collcode".  I
> see other places underscores aren't used to separate words, like
> "div_allele.referencedb", "div_locality.origcty",
> "div_passport.accenumb", and "div_passport.sampstat".
>

Let me first say that I couldn't agree with you more about being
consistent for easy use etc.....

Although....
- At the time of developing this schema, I didn't see accename,
  accession, etc. as being the same function.  And I wasn't
  planning to include such in all tables.
- collnumb, collsrc, collcode, etc. are from the IPGRI Passport
  Descriptors.   That is why I choice those forms.
- And most of all.   It is way too late to be changing this stuff!
  If we change stuff like this now, it would be a huge headache
  to modify all the existing GDPDM databases.   Adding tables
  (or even fields) isn't a problem.   That doesn't break any
  existing databases/tools, but changing existing stuff is
  very bad.

>
> Should "div_passport.germplasm_type" be a separate lookup table to a
> "germplasm_type" table exporting a "germplasm_type_id"?  It bothers
> me that the table is called "passport" but that it has a
> "germplasm_type" -- it seems like either the table should be called
> "germplasm" or this field should be called "passport_type."  Either
> way, is this relationship denormalized?
>

Please see documentation.   germplasm_type is deprecated.
sampstat should be used instead.

>
> I think "div_stock_parent.recurrent" is ambiguous and should be
> renamed "is_recurrent."  Perhaps it should also have a default value
> of either "0" or "1" -- is there some biological significance to
> declaring it needs to have a value?
>

I would have made this change a year ago.   Too late now.
Maybe it could default to "0," but that field is required.
Every defined parent will either be recurrent or not.  Unless
maybe it is unknown.   We'd need to ask a biologist.

>
> I'd like to see the order of fields be PK, FKs, "name"-like field,
> other fields, "comments", so I'd like to see
> "div_allele_assay.div_ref_stock_id" moved up, perhaps after
> "div_scoring_tech_type_id"?
>

I thought I did all those.   I'll change it.

>
> I think "cdv_marker.marker_aid" indicates an accession, right?  So it
> should probably be renamed to something more consistent with the rest
> of the schema, like "accename" or "accession."  (Embarrassingly, I
> think I lobbied for that field.  :-)
>

Too late!

>
> > I know you are still wanting the accession fields
> > added to the schema.   Maybe we can discuss
> > it at the Gramene retreat.   Both Ed and I are
> > planning to attend.
>
> I feel like there are already several accessions in there:
>
>    div_passport.accename
>    div_allele.accession +
>    div_trait_uom.to_accession * +
>    div_taxonomy.term_accession * +
>    div_treatment_uom.eo_accession * +
>    cdv_marker.marker_aid * +
>
>    * - Really an external accession, but the same idea
>    + - Should there be a UNIQUE constraint on these?
>

We can certainly consider making these unique.
But we'd need to think about each one to make
sure we know that its ok.

>
> I think it's just a matter of "embracing and extending" this idea.  :-)
>

I don't think we are going to solve this by email.
We (Ken, Terry, Ed, Peter, anyone else interested) probably
need to discuss this face to face.

>
> ky
>