Re: [Gdpdm-devel] Re: GDPDM .sql corrections/updates
Brought to you by:
tcasstevens
From: Peter B. <pj...@co...> - 2006-02-08 14:14:45
|
Ken and Terry, I'll throw my two cents in on a couple of topics, though most of the topics have been handled well. `accename` varchar(255) unique NOT NULL is correct syntax according to the MySQL manual: column_definition: col_name type [NOT NULL | NULL] [DEFAULT default_value] [AUTO_INCREMENT] [UNIQUE [KEY] | [PRIMARY] KEY] [COMMENT 'string'] [reference_definition] Terry and talked about populating new database instances with basic reference data. So, I think that's a good idea. Although, it would be easy enough to make that a separate script or to create a data-free version of the database creation script if that's needed. On recurrent parent info: That's very basic pedigree information. If you don't know it, you really do not have a complete pedigree and probably should not be putting the information in the database. When you make a cross, if you then backcross the progeny to one of the parents, the parent you backcrossed to is the recurrent parent. Requiring recurrent makes sense and 0 would be a reasonable default value. On making accession fields unique: Since these are identifiers from external databases, a GDPDM database manager has no control over them. It would, in general, be dangerous to make them unique. For instance, it would be quite possible to define two similar traits in a local db that both link to the same trait_ontology term. div_allele.accession is specific to a div_allele.referencedb, so it is really a combination of the two fields that might be unique. Since a div_allele can have more than one value there might legitimately be more than one record in the table for an external allele.accession. Peter At 07:24 PM 2/7/2006, you wrote: >Hi Ken, > >Thanks for reviewing the DDL. >Please see my embedded comments below. > > > >Cheers, > >Terry > > >On 2/6/06, Ken Youens-Clark <ky...@gm...> wrote: > > On Feb 3, 2006, at 4:43 AM, Terry Casstevens wrote: > > > > > I have the latest version 2.0 DDL and documentation > > > on the GDPDM web site (NOTE: this does NOT have > > > the addition tables recently discussed). Please take > > > a look to make sure everything looks good to you. > > > CVS on sourceforge is having problems, so I'll > > > check this in later. Let me know if you have > > > any suggestions? > > > > > > http://www.maizegenetics.net/gdpdm/documentation.html > > > > Hmm, this is odd. The first thing I tried was to graph this using > > SQL::Translator and Graphviz, and SQLT choked on this: > > > > CREATE TABLE `div_passport` ( > > ... > > `accename` varchar(255) unique NOT NULL > > > > That "unique" in there is odd! MySQL dumped this? I'm only used to > > see UNIQUE constraints declared like: > > > > UNIQUE(`accename`) > > > >No, mysqldump did not produce that. I put that in. I wasn't >sure how it should be, but it executed correctly when I built >a new database? I will change it if you think I should? > > > > > > I also had to remove the data insertion statements (SQLT doesn't know > > about that syntax), but you don't really want to be distributing a > > schema with default info, right? Make sure to run "mysqldump --no- > > data", I think. > > > >I did use the --no-data option when I first generated the script. >I put those insert statements in afterward. If you look at the >tables and values that are being inserted, you'll see they are very >generic. And from my understanding, needed for pretty much >any GDPDM database. So I meant to make it easier for >folks. Do you still disagree? > > > > > > All the relationships parse and draw well, so I think the FKs are good. > > > > I'm a little bothered by some inconsistencies, like "div_passport" > > has an "accename" accession field, but "div_allele" has one called > > "accession." I wish this idea was consistent across all tables. > > Also, "div_accession_collection" uses underscores and the abbr. "col" > > for "col_date" but not for "collnumb", "collsrc" and "collcode". I > > see other places underscores aren't used to separate words, like > > "div_allele.referencedb", "div_locality.origcty", > > "div_passport.accenumb", and "div_passport.sampstat". > > > >Let me first say that I couldn't agree with you more about being >consistent for easy use etc..... > >Although.... >- At the time of developing this schema, I didn't see accename, > accession, etc. as being the same function. And I wasn't > planning to include such in all tables. >- collnumb, collsrc, collcode, etc. are from the IPGRI Passport > Descriptors. That is why I choice those forms. >- And most of all. It is way too late to be changing this stuff! > If we change stuff like this now, it would be a huge headache > to modify all the existing GDPDM databases. Adding tables > (or even fields) isn't a problem. That doesn't break any > existing databases/tools, but changing existing stuff is > very bad. > > > > > > Should "div_passport.germplasm_type" be a separate lookup table to a > > "germplasm_type" table exporting a "germplasm_type_id"? It bothers > > me that the table is called "passport" but that it has a > > "germplasm_type" -- it seems like either the table should be called > > "germplasm" or this field should be called "passport_type." Either > > way, is this relationship denormalized? > > > >Please see documentation. germplasm_type is deprecated. >sampstat should be used instead. > > > > > > I think "div_stock_parent.recurrent" is ambiguous and should be > > renamed "is_recurrent." Perhaps it should also have a default value > > of either "0" or "1" -- is there some biological significance to > > declaring it needs to have a value? > > > >I would have made this change a year ago. Too late now. >Maybe it could default to "0," but that field is required. >Every defined parent will either be recurrent or not. Unless >maybe it is unknown. We'd need to ask a biologist. > > > > > > I'd like to see the order of fields be PK, FKs, "name"-like field, > > other fields, "comments", so I'd like to see > > "div_allele_assay.div_ref_stock_id" moved up, perhaps after > > "div_scoring_tech_type_id"? > > > >I thought I did all those. I'll change it. > > > > > > I think "cdv_marker.marker_aid" indicates an accession, right? So it > > should probably be renamed to something more consistent with the rest > > of the schema, like "accename" or "accession." (Embarrassingly, I > > think I lobbied for that field. :-) > > > >Too late! > > > > > > > I know you are still wanting the accession fields > > > added to the schema. Maybe we can discuss > > > it at the Gramene retreat. Both Ed and I are > > > planning to attend. > > > > I feel like there are already several accessions in there: > > > > div_passport.accename > > div_allele.accession + > > div_trait_uom.to_accession * + > > div_taxonomy.term_accession * + > > div_treatment_uom.eo_accession * + > > cdv_marker.marker_aid * + > > > > * - Really an external accession, but the same idea > > + - Should there be a UNIQUE constraint on these? > > > >We can certainly consider making these unique. >But we'd need to think about each one to make >sure we know that its ok. > > > > > > I think it's just a matter of "embracing and extending" this idea. :-) > > > >I don't think we are going to solve this by email. >We (Ken, Terry, Ed, Peter, anyone else interested) probably >need to discuss this face to face. > > > > > > ky > > > > >------------------------------------------------------- >This SF.net email is sponsored by: Splunk Inc. Do you grep through log files >for problems? Stop! Download the new AJAX search engine that makes >searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! >http://sel.as-us.falkag.net/sel?cmdk&kid3432&bid#0486&dat1642 >_______________________________________________ >Gdpdm-devel mailing list >Gdp...@li... >https://lists.sourceforge.net/lists/listinfo/gdpdm-devel |