Re: [Gusdev-gusdev] GUS 3.0 schema changes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Joan-

Arnaud did supply us with documentation (attached) for the new Phenotype tables,
but I just haven't loaded it into the database yet (I've also been quite busy :))
I started working on updating the documentation a couple of days ago, but in the
process discovered that there are some invalid rows in core.DatabaseDocumentation
that should be corrected first.  A query shows that there are 73 rows in this
table that reference nonexistent columns in GUS 3.0.  For the most part I think
that these are relatively minor problems stemming from the fact that the schema
has been updated more recently than the documentation.  However, there are also
a few rows that suggest we need to improve the plugin and/or procedure used to
populate this table.  For example, the following rows have spaces in the column
name (attribute_name), probably because the input files were invalid and the plugin
has no restrictions on the format of the attribute_name:

DATABASE_DOCUMENTATION_ID
-------------------------
ATTRIBUTE_NAME
--------------------------------------------------------------------------------
		     1419
bio_material_id fk to LabelledExtract view of BioMaterial

		     1103
bio_source_characteristic_id primary key

		     1120
treatment_id fk to Treatment

DATABASE_DOCUMENTATION_ID
-------------------------
ATTRIBUTE_NAME
--------------------------------------------------------------------------------
		     1374
review_status_id The identifer of the review status

		     1418
assay_id fk to Assay

		     1373
synonym_name The gene symbol

6 rows selected.

Also, as an aside (and not a comment to you in particular), it strikes me that
column "documentation" of the form "fk to Table X" and "Primary key" could be
generated automatically from the schema.  However, comments on foreign keys
are useful if they identify the specific subclass (i.e. view) to which the
reference is expected to link, or if they explain what the referenced value is
used for (if not obvious).  Anyway, since there are still some minor schema
changes taking place, I think that next week might be a good time to worry
about updating all the documentation, since the database will be locked down
for the migration at that point anyway.  As for the controlled vocabularies,
I think you're right, and we should try to populate these as soon as we can,
even if it will be an iterative process in some cases.

Jonathan

-- 
Jonathan Crabtree
Center for Bioinformatics, University of Pennsylvania
1406 Blockley Hall, 423 Guardian Drive Philadelphia, PA 19104-6021
215-573-3115