Re: [Gmod-schema] Re: Action points from yesterday's conversation on the modular schema

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Chris,

Actually, I just created the CVS repository for the schema using your
suggestion from last week, i.e., 

> > > /sql
> > >     sequence/
> > >              sequence.sql
> > >              use-cases/

That could easily be undone at this point since I am sure nobody has
actually used it yet, I could just wipe it and start over.  If I were to
vote I would probably lean toward SCHEMA-TYPE-MODULE, but as I am not
very close to application development at this point, my vote probably
should count for much.  I just feels to me that this structure would
give maximum flexibility, which is a good thing when creating something
like a schema, where the only thing we are sure of is that it will
change.  The possibility of an "annoying extra directory" is not really
enough to dissuade me from this structure, since that is what soft links
are for.

Scott

On Mon, 2002-10-21 at 15:19, Chris Mungall wrote:
> 
> Have we decided on the directory organisation yet?
> 
> As I see it we have a collection of files categorised along the following
> axes:
> 
> module (sequence, genetics, etc)
> 
> schema instance ("chado" is I think the name we have finalised on). I am
> still thinking in terms of gmod-schema being a collection of complementary
> but possibly redundant schemas; perhaps others would prefer to think in
> terms of one single gmod-schema?
> 
> file-type; eg DDL as SQL statements, documentation (html, text,
> images, other), xml schema translation of the relational schema, adapter
> code, etc
> 
> These could be arranged in a directory structure in the following ways:
> 
> (1) (SCHEMA-TYPE-MODULE)
> 
> gmod-schema/
> 	chado/
> 		sql/
> 			sequence/
> 			genetics/
> 			expression/
> 
> this way doesn't force chado's modular divisions on the rest of
> gmod-schema
> 
> (2) (MODULE-SCHEMA-TYPE)
> 
> gmod-schema/
> 	sequence/
> 		chado/
> 			sql/
> 				sequence.sql
> 		bio-db-gff/
> 			bio-db-gff.sql
> 	genetics/
> 		chado/
> 			genetics.sql
> 
> this enforces the divisions Dave and I decided upon on the rest of
> gmod-schema; maybe not ideal, for instance, there is an argument for
> further subdividing what Dave and I call 'genetics' into 'phenotype'. on
> the other hand, it is properly modularised
> 
> (3) (MODULE-TYPE)
> 
> gmod-schema/
> 	sequence/
> 		sql/
> 			sequence.sql
> 		docs/
> 			sequence.txt
> 			coordinate-system.txt
> 
> this is if we decide there is only one core gmod-schema (obviously still
> allowing databases such as bio-db-gff in project specific repositories).
> 
> I'm happy with any, just making sure we're all on the same page. If
> pressed I'd vote for
> SCHEMA/MODULE/TYPE or SCHEMA/TYPE/MODULE
> 
> but then if nothing else other than 'chado' is going to live here then we
> have an extra annoying pointless directory.
> 
> 
> On 21 Oct 2002, Scott Cain wrote:
> 
> > Chris,
> >
> > While I was preparing to create a sourceforge GMOD cvs repository for
> > the modular schema, I came across this page:
> >
> > http://sourceforge.net/docman/display_doc.php?docid=768&group_id=1
> >
> > About half way down the page is a section "Import of Existing CVS
> > Repositories," which details how to get files that are under current CVS
> > control into the sourceforge CVS control.  Since you already have these
> > files under CVS control, do you want to move them directly to
> > sourceforge (and thus maintain their revision history), or just import
> > what have and lose the history? If you make the tarball as directed, you
> > can send it to me and I will deal with sourceforge support.
> >
> > Thanks,
> > Scott
> >
> > On Fri, 2002-10-18 at 15:25, Chris Mungall wrote:
> > >
> > >
> > > On 18 Oct 2002, Scott Cain wrote:
> > >
> > > > Dave,
> > > >
> > > > We should definitely get this stuff under cvs control.  I was thinking
> > > > of a module named schema with doc, src and image subdirectories to hold
> > > > the information that is in the three tar balls that are currently on the
> > > > website.  If nobody has any objections, that's what I'll do.
> > >
> > > That would be great, thanks.
> > >
> > > (it is under cvs at the moment, but just in a flat scratch space, the
> > > sooner we get it a real home the better)
> > >
> > > just to be pernickity -
> > >
> > > i expect code to live under src/ - i'd make it sql/
> > >
> > > shall we just stick the images under doc/
> > >
> > > should we have the doc directory mirror the modular structure of the sql
> > > directory, or should we just have module-specific docs
> > >
> > > the way we have it now is
> > >
> > > /sql
> > >     sequence/
> > >              sequence.sql
> > >              use-cases/
> > >
> > > and so on
> > >
> > > sorry to be a bore about these things but it's important to get a cvs dir
> > > set up right as it's a pain to change the structure once it's underway!
> > >
> > > > About the phone meeting, I'll answer for Lincoln, so that you can get an
> > > > answer before you go home today.  Lincoln is teaching a course this week
> > > > and next, so his time during the day is rather limited.  I don't know
> > > > about is plans for the week after, but perhaps that is when we should
> > > > shoot for.
> > > >
> > > > Scott
> > > >
> > > >
> > > > On Fri, 2002-10-18 at 11:34, David Emmert wrote:
> > > > > Hi all,
> > > > >
> > > > > First of all, Scott, I had a look at the Modular Schema info you put on
> > > > > http://www.gmod.org, and it looks great - many thanks.  I wonder if you
> > > > > have any ideas as to how we can go about improving whats there and  making
> > > > > sure it is current.  Should we be thinking about putting the Modular
> > > > > Schema into the sourceforge CVS now, and if so, how organized?
> > > > >
> > > > > Thanks also for setting up the schema mailing list.
> > > > >
> > > > > I wanted to let you all know that I've successfully loaded all of the
> > > > > D.melanogaster "release 3" genome annotation GenBank records into the
> > > > > schema, and the sequence module seems to have worked beautifully.  I
> > > > > finished the loader just last night so I havn't completely evaluated
> > > > > the results, but the annotations I've looked at look good.
> > > > >
> > > > > There's at least one gene model annotation which *didn't* load properly,
> > > > > mod(mdg4), which is a nasty case of trans splicing who's "join" locations
> > > > > my location parser definitely did not appreciate.  Here's what one of the
> > > > > mod(mdg4) GB mRNA features looks like:
> > > > >
> > > > >      mRNA            join(138523..138735,138795..139263,
> > > > >                      complement(154413..154524),complement(153944..154201),
> > > > >                      complement(153727..153866),complement(152185..153037))
> > > > >                      /product="CG32491-PZ"
> > > > >                      /note="trans splicing"
> > > > >
> > > > > Parser go bung!
> > > > >
> > > > > I'm sure this case is workable in the schema, and I'll work on parsing
> > > > > locations of this ilk as soon as I get a chance.
> > > > >
> > > > > Lincoln, I focused on this instead of the WormBase data because in
> > > > > the context of our local (FlyBase) development, and learning how to
> > > > > layer-on the genetic/phenotypic data, we really needed to get a test-bed
> > > > > to work with, and it looks like a proper port of Berkeley's gadfly data
> > > > > is going to take some time coming.
> > > > >
> > > > > I'll take a look at the WormBase GFF and .ace data now.
> > > > >
> > > > > In the meantime, if any of you would like a postgres dump of this
> > > > > data to play with, please let me know.   Please, everyone, be aware
> > > > > that the current D.melanogaster "release 3" genome annotation data
> > > > > in GenBank imperfect, and these imperfections (only, I hope) are
> > > > > obviously going to be in this test data.
> > > > >
> > > > > Once I've convinced myself I've implemented this properly, I want to
> > > > > start writing some practical documents on implementing data in the
> > > > > sequence module.   Scott, others, if you have any opinions on format
> > > > > or content this should have, please let me know.
> > > > >
> > > > > If I get a chance, I'm going to try to get Gbrowse up and running on
> > > > > this data, as I'm very anxious to know how the Modular Schema and
> > > > > Gbrowse play together.   I have no idea how easy or difficult this
> > > > > will be, being totally unfamiliar with Gbrowse;  if anybody wants to
> > > > > give advice or lend a hand, please do!
> > > > >
> > > > > Finally, Lincoln mentioned we set up further conference calls, and
> > > > > I'd like to suggest we shoot for next Wednesday, 22 Oct, 3pm EST -
> > > > > same time as last time.  Would that work for everybody?
> > > > >
> > > > > I'll be out of town on Monday or Tuesday, but checking mail off and
> > > > > on, so apologies in advance if my replies are slow in coming.
> > > > >
> > > > > Best,
> > > > >
> > > > > -Dave
> > > > >
> > > > >
> > > > > >> From ls...@pe... Thu Oct 10 12:56 EDT 2002
> > > > > >> From: Lincoln Stein <ls...@cs...>
> > > > > >> To: wa...@cs..., kc...@cs...
> > > > > >> Subject: Action points from yesterday's conversation on the modular schema
> > > > > >> Date: Thu, 10 Oct 2002 12:57:32 -0400
> > > > > >> User-Agent: KMail/1.4.3
> > > > > >> Cc: Chris Mungall <cj...@fr...>, David Emmert <em...@mo...>,
> > > > > >>         Scott Cain <ca...@cs...>
> > > > > >> MIME-Version: 1.0
> > > > > >> Content-Transfer-Encoding: 8bit
> > > > > >> X-MIME-Autoconverted: from quoted-printable to 8bit by morgan.harvard.edu id MAA10948
> > > > > >>
> > > > > >> Hi All,
> > > > > >>
> > > > > >> I thought our conversation yesterday about the modular schema was very
> > > > > >> productive, and I look forward to David setting up a schedule for further
> > > > > >> talks.  Just a summary of the action points that we ended on:
> > > > > >>
> > > > > >> Because ideally the modular schema should support the application modules that
> > > > > >> we've already contributed to gmod, we're going to put together test sets for
> > > > > >> David and Chris to work with.
> > > > > >>
> > > > > >> 1) Lincoln to provide sequence feature data from WormBase in GFF and .ace
> > > > > >> format
> > > > > >> 2) Ken & Doreen to provide genetic map and correspondence data in the form of
> > > > > >> relational database table dumps
> > > > > >> 3) Doreen to provide curated mutants/phenotypes/alleles in some form (to be
> > > > > >> determined)
> > > > > >> 4) Scott to set up mailing list on gmod site to help coordinate this.
> > > > > >>
> > > > > >> The data sets will be submitted via e-mail to David.  I will do this by
> > > > > >> putting a data set on an FTP site and sending the URL to David.
> > > > > >>
> > > > > >> Lincoln
> > > > >
> > > > >
> > > > >
> > > > > -------------------------------------------------------
> > > > > This sf.net email is sponsored by:ThinkGeek
> > > > > Welcome to geek heaven.
> > > > > http://thinkgeek.com/sf
> > > > > _______________________________________________
> > > > > Gmod-schema mailing list
> > > > > Gmo...@li...
> > > > > https://lists.sourceforge.net/lists/listinfo/gmod-schema
> > > > >
> > > >
> > >
> > >
> >
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         ca...@cs...
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory