[Gmod-schema-cmts] schema/chado INSTALL, 1.44, 1.45 README.Apollo, 1.4, 1.5

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Update of /cvsroot/gmod/schema/chado
In directory sc8-pr-cvs2.sourceforge.net:/tmp/cvs-serv25856

Modified Files:
	INSTALL README.Apollo 
Log Message:
major overhaul of how CDS and UTR GFF3 features are handled by the 
bulk loader.  Code still needs to be streamlined (ie, moving constants
and DBI prepares out of methods) but it appears to work.

Index: README.Apollo
===================================================================
RCS file: /cvsroot/gmod/schema/chado/README.Apollo,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -d -r1.4 -r1.5
*** README.Apollo	29 Apr 2005 18:07:58 -0000	1.4
--- README.Apollo	4 Oct 2006 16:47:30 -0000	1.5
***************
*** 11,15 ****

      You can edit the template to grant privileges to just specific
!     users instead of PUBLIC.

    - insert several ad hoc cv terms found in 
--- 11,15 ----

      You can edit the template to grant privileges to just specific
!     users instead of PUBLIC, which of course would be a lot more secure.

    - insert several ad hoc cv terms found in 
***************
*** 22,26 ****
      sql file can be inserted into the database with this command:

!       $ psql $DBNAME <modules/sequence/apollo-bridge/cv_inserts.sql 

    - insert a few misc items like those found in 
--- 22,26 ----
      sql file can be inserted into the database with this command:

!       $ psql $DBNAME < modules/sequence/apollo-bridge/cv_inserts.sql 

    - insert a few misc items like those found in 
***************
*** 28,40 ****

      Essentially, Apollo and chado have to agree about what the names
!     in the analysis table are, and there needs to be a few terms in 
!     the cv table. 

    - add the functions and triggers to chado.  Use the perl script:
        % /usr/local/bin/gmod_apollo_triggers.pl create
      Note that this script can also be used to drop and add the triggers
      in case you want to do some bulk loading and want to disactivate
      the triggers for that. 

  I understand that these directions are somewhat vague.  Getting Apollo to
  work with chado requires a little hands on tinkering.  If you find yourself
--- 28,76 ----

      Essentially, Apollo and chado have to agree about what the names
!     of the programs in the analysis table are, and there needs to be a
!     few terms in the cv table.
! 
!     About naming analysis results: there are no restrictions on what you
!     name them, however the GFF3 bulk loader will insert them typically 
!     one of two ways:
! 
!     1. If you specify the -a (--analysis) flag with an argument, the 
!     loader will look for an entry in the analysis table where
!     analysis.name is equal to the argument supplied with -a.
! 
!     2. If you don't give an argument with -a, the loader will look
!     for analysis.name that is equal to the GFF source and type concatentated
!     with and underscore between them, ie 'source_type', eg, 'Rice_cDNA_match'.
! 
!     The inserts in the db table are there so that dbxref entries in GFF files
!     I typically process.  They can be ignored unless you are loading
!     GFF files with Dbxref entries.
! 
!     Then there is the uniquename_id_generator sequence, for which
!     integers for building uniquenames are used. Then there are two
!     interts in to cvtermprop for the suffix and prefix for uniquename
!     generation.  Generated uniquenames with be of the form:
! 
!         $prefix . (int from uniquename_id sequence) . $suffix
! 
!     For exmple, if you insert 'RICE' for the prefix and 'X' for the suffix, 
!     the resulting name of the first feature will be 'RICE000001X';

    - add the functions and triggers to chado.  Use the perl script:
+ 
        % /usr/local/bin/gmod_apollo_triggers.pl create
+ 
      Note that this script can also be used to drop and add the triggers
      in case you want to do some bulk loading and want to disactivate
      the triggers for that. 

+   - modify the Apollo configuration file, chado-adapter.xml.  IS THERE
+     DOCUMENTATION FOR DOING THAT IN APOLLO'S DOCS?  Yes, there is
+     a section for connecteding directly to flybase chado in the apollo
+     doc directory: doc/html/userguide.html but it is somewhat out of date.
+     I need to work with Mark to update it (it will probably need me to 
+     write it and have Mark proof it).
+ 
+ 
  I understand that these directions are somewhat vague.  Getting Apollo to
  work with chado requires a little hands on tinkering.  If you find yourself

Index: INSTALL
===================================================================
RCS file: /cvsroot/gmod/schema/chado/INSTALL,v
retrieving revision 1.44
retrieving revision 1.45
diff -C2 -d -r1.44 -r1.45
*** INSTALL	13 Apr 2006 20:39:16 -0000	1.44
--- INSTALL	4 Oct 2006 16:47:29 -0000	1.45
***************
*** 5,16 ****
  This document describes the procedure for installing the chado 
  schema and loading data from GFF3 data records.  This is currently
! considered alpha software, so expect there to be bumps in the road.  When
! you experience problems, please email them to the gmod-devel mailing list
! at gmo...@li....  This release will work with the
! most recent release of the Generic Genome Browser (gbrowse) version 1.62
  though there have been bug fixes to gbrowse that will be rolled into
! a new 1.63 release in the near future.  If you experience difficulties
  with gbrowse and chado, you might want to look at getting a cvs
! checkout of the bugfix branch.  The installation instructions for
  gbrowse are included in that package.  There are plans to make the
  installation of gbrowse and other components more automatic, but for
--- 5,16 ----
  This document describes the procedure for installing the chado 
  schema and loading data from GFF3 data records.  This is currently
! considered beta software, so expect there to be bumps in the road.  When
! you experience problems, please email them to the gmod-schema mailing list
! at gmo...@li....  This release will work with the
! most recent release of the Generic Genome Browser (gbrowse) version 1.65
  though there have been bug fixes to gbrowse that will be rolled into
! a new 1.66 release in the near future.  If you experience difficulties
  with gbrowse and chado, you might want to look at getting a cvs
! checkout of the gbrowse-session branch.  The installation instructions for
  gbrowse are included in that package.  There are plans to make the
  installation of gbrowse and other components more automatic, but for
***************
*** 18,22 ****

  This release of chado/gmod also comes with example functions for
! allowing Apollo (http://www.gmod.org/apollo.shtml) to read and
  write directly to the database, allowing creation and editing of
  genome features.  Please see the file README.Apollo for some details
--- 18,22 ----

  This release of chado/gmod also comes with example functions for
! allowing Apollo (http://www.gmod.org/apollo) to read and
  write directly to the database, allowing creation and editing of
  genome features.  Please see the file README.Apollo for some details
***************
*** 26,35 ****
  Scott Cain
  ca...@cs...
! May 2, 2005

  Prerequisites:

!   PostgreSQL (currently, developers are using 7.4). Items to do with
!   postgres to make it ready to go:

      * make it accept tcp/ip connections by adding this line to postgresql.conf
--- 26,45 ----
  Scott Cain
  ca...@cs...
! Sept 12, 2006

  Prerequisites:

!   PostgreSQL; currently, developers are using 7.4 and 8.1. There are currently
!   two main considerations in choosing what version of PostgreSQL to use:
! 
!     1. Version 8.1 is currently incompatible with the triggers that work
!     with Apollo
! 
!     2. Version 7.4 is incompatible with the functions for making CMap
!     integration more seamless.
! 
!   Neither problem is insurmountable; it is really a matter conconvience.
! 
!   Items to do with postgres to make it ready to go:

      * make it accept tcp/ip connections by adding this line to postgresql.conf
***************
*** 113,117 ****
    Apache (1.3.* or 2.0.*) (for gbrowse, gmod-web requires 2.0)

!   BioPerl (bioperl-live or greater than 1.5)
            (-microarray 0.1 --required for microarray data)

--- 123,127 ----
    Apache (1.3.* or 2.0.*) (for gbrowse, gmod-web requires 2.0)

!   BioPerl (bioperl-live or greater than 1.5.1)
            (-microarray 0.1 --required for microarray data)

***************
*** 136,142 ****
      * XML::Parser::PerlSAX     (chado)
      * Module::Build            (chado)
!     * Class::DBI               (chado)
!     * Class::DBI::Pg           (chado)
!     * Class::DBI::Pager        (chado)
      * DBIx::DBStag             (chado)
      * XML::Simple              (chado)
--- 146,152 ----
      * XML::Parser::PerlSAX     (chado)
      * Module::Build            (chado)
!     * Class::DBI               (GMODWeb)
!     * Class::DBI::Pg           (GMODWeb)
!     * Class::DBI::Pager        (GMODWeb)
      * DBIx::DBStag             (chado)
      * XML::Simple              (chado)
***************
*** 159,162 ****
--- 169,176 ----
         $ setenv VARNAME value

+     To make life easier on yourself, you will probably want to put those
+     commands in your .cshrc or .bashrc file so that the envirnment variables
+     are always available when you log in. 
+ 
     * GMOD_ROOT: The location of your GMOD installation (e.g., "/usr/local/gmod")

***************
*** 185,189 ****

  *   perl Makefile.PL   
-     Creates AutoDBI.pm.

        During this step you are prompted for several configuration values
--- 199,202 ----
***************
*** 234,239 ****
          *   What is the default organism (common name, or "none")?

! 
          At this point, if you answered "No" to the "use default schema"
          question, you will be prompted about what database exensions
--- 247,259 ----
          *   What is the default organism (common name, or "none")?

+         The organism name should be one what will be in the organism table.
+         When the database is created, several organisms will be there
+         by default; these include: human, fruitfly, mouse, mosquito,
+         rat, mustard weed, worm, zebrafish, rice, and yeast.  (The
+         insert statements that create these default organisms are 
+         contained in load/etc/initialize.sql).

!         THIS PARAGRAPH IS OUT OF DATE, AS IS THE SECTION AT THE
!         END, AND BOTH SHOULD BE REWRITTEN.
          At this point, if you answered "No" to the "use default schema"
          question, you will be prompted about what database exensions
***************
*** 248,255 ****
  *   (sudo) make install
      Probably needs to be run as root.  Installs data loading scripts
!     in /usr/local/bin, perl modules, as well as placing various files
!     in $GMOD_ROOT, and creating the infastructure for logging of
!     errors by creating $GMOD_ROOT/logs and creating the
!     file /etc/log4perl.conf if it does not already exist.

--- 268,275 ----
  *   (sudo) make install
      Probably needs to be run as root.  Installs data loading scripts
!     in perl's path (typically /usr/local/bin or /usr/bin), perl modules,
!     as well as placing various files in $GMOD_ROOT, and creating the
!     infastructure for logging of errors by creating $GMOD_ROOT/logs and
!     creating the file /etc/log4perl.conf if it does not already exist.

***************
*** 273,281 ****
      Gets and installs various ontologies.  Requires a network 
      connection.  Absolutely required are the Relationship Ontology and
!     the Sequence Ontology Feature Annotation (SOFA).  All others are
!     optional.  Note retrieved ontology files are stored in the directory
!     specified when perl Makefile.PL was run (the default is ./tmp).  In
!     order to do a repeat installation, lock files need to be removed to
!     allow reinstallation of ontologies.  Those lock files can be removed
      by executing `make rm_locks`.  Alternatively, deleting everything in
      the temporary directory will force the re-downloading of the ontology
--- 293,301 ----
      Gets and installs various ontologies.  Requires a network 
      connection.  Absolutely required are the Relationship Ontology and
!     the Sequence Ontology (SO).  All others are optional.  Note retrieved
!     ontology files are stored in the directory specified when
!     perl Makefile.PL was run (the default is ./tmp).  In order to do a
!     repeat installation, lock files need to be removed to allow
!     reinstallation of ontologies.  Those lock files can be removed
      by executing `make rm_locks`.  Alternatively, deleting everything in
      the temporary directory will force the re-downloading of the ontology
***************
*** 289,297 ****
      can execute a command for each file to load it.  Note again that
      the Relationship Ontology is required before all others, and the
!     the Sequence Ontology Feature Annotation (SOFA) is absolutely required
!     for proper functioning of the database.  The command to load an
!     ontology and its definition file (if it exists) is this:

!         $ gmod_load_ontology.pl /path/to/onto_file [/path/to/def_file]

      It is not a bad idea at this point to make a back up of the database,
--- 309,328 ----
      can execute a command for each file to load it.  Note again that
      the Relationship Ontology is required before all others, and the
!     the Sequence Ontology (SO) is absolutely required for proper
!     functioning of the database.  The command to load an ontology is
!     this:

!         go2fmt.pl -p obo_text -w xml /path/to/obofile | \
!             go-apply-xslt oboxml_to_chadoxml - > obo_text.xml
! 
!     to create a chadoxml file of the obo file, and then execute:
! 
!         stag-storenode.pl \
!      -d 'dbi:Pg:dbname=$CHADO_DB_NAME;host=$CHADO_DB_HOST;port=$CHADO_DB_PORT' \
!      --user $CHADO_DB_USERNAME --password $CHADO_DB_PASSWORD obo_text.xml
! 
!     If you have other ontology format files, the commands are similar;
!     consult the documentation for go2fmt.pl and go-apply-xslt for your
!     file format.

      It is not a bad idea at this point to make a back up of the database,
***************
*** 344,354 ****
      development is ongoing to provide better translation.

- DUMPING GFF3
- 
- The script gmod_dump_gff3.pl can be used to dump GFF3 from a chado
- database.  If executed with no arguments, dump_gff3.pl will dump all
- features for the default organism in the database.  Alternatively,
- you can provide the organism or reference sequence to dump.
- 
  GENERIC GENOME BROWSER

--- 375,378 ----