On Fri, 2004-07-23 at 07:59, Patrick McConnell wrote:
> Thank you for such a descriptive answer! The GMOD project is very
Thanks; I think so too :-)
> As far as pipelines go, our project (somewhat) streamlines sequence trace
> decoding, clustering, blasting, and annotating. Is this outside the scope
> of GMOD? If so, I don't think I fully understand the scope of GMOD.
My main concern was how far back into the LIMS portion of the work you
wanted to go. Having used to work for a company that (unsuccessfully)
tried to sell LIMS software, I'm a little gun-shy about getting into
tracking clones. I'm even a little worried about including trace
decoding, but since that is the first step, it shouldn't be too
The chado schema is modular, so it is generally pretty easy to add
modules. For instance, the sequence module deals with sequences and
features and the cv module deals with controlled vocabularies
(ontologies). While there is currently no module for EST clustering
(including steps leading up to the clustering), I have been talking with
a group about incorporating their schema and software into GMOD. The
question remains whether their schema will integrate well with chado.
(Since a sequence module already exists, there will presumably be some
overlap between the clustering module and the sequence
module--potentially breaking software they've written).
> As far as components go, our project can be divided into three major
> components: machine processing pipeline, human curation site, and public
> website. Actually, the last two are combined for us, though they need not
> be. The machine processing pipeline can be broken down into: decoding of
> sequences, clustering, blasting, and annotation. The website can be broken
> down into: GO tree, cluster list, and BLAST interface. The cluster list
> consists of a list of all annotated clusters, a cluster page that shows
> annotations and sequence data (our clusters are
> cluster->subcluster->clone->read, so we have a tree for this), blast
> evidence for annotations, and cluster evidence. The human curation is
> built directly into those modules, where annotations can be accepted or
> rejected by experts that are logged into the system.
> I imagine your gmod-web will contain many of the features I described. As
> each project will have some unique requirements for display, will gmod-web
> be easily extended?
The thing that is very cool about gmod-web is that it is a template
based system that reflects all of the information present in the schema,
so even if you add your own tables to the schema, gmod-web can represent
that data without much work from you.
> And I think such a pipeline would make a good GMOD module, as something
> like this is really missing from the community right now. Of course, I
> think the pipeline should be very pluggable, as everyone will argue about
> how best to do things. However, I think the general workflow I describe is
> somewhat universally applicable, if incomplete.
> -Patrick McConnell
> Duke Bioinformatics Shared Resource
> Duke Comprehensive Cancer Center
> Scott Cain
> <cain@...> To: Patrick McConnell <MCCon012@...>
> cc: Simon M Lin <lin00025@...>, gmod list
> 07/22/2004 04:23 <gmod-devel@...>
> PM Subject: Re: gmod as a full annotation system
> Hi Patrick and Simon,
> I am cc'ing this message to the gmod-devel mailing list, since I
> consider it important and more generally of interest.
> It is definitely true that many of the projects that are part of GMOD
> are genome-centric (although I know of people who use GBrowse for cDNA
> database, that needs to be done carefully to make sure the results are
> sensible). The reason for this is obvious: the initial contributing
> members of the GMOD consortium come from well established MODs. That
> said, I consider it very important to support projects like yours with
> GMOD, since there are plenty of projects/organisms that will never (at
> least for the foreseeable future) have a sequenced genome.
> Along those lines, I've given some thought to the additional items that
> need to be included in GMOD to be more suitable to cDNA projects. The
> most obvious to me is EST clustering software, and I've looked at
> incorporating StackPack or the TIGR clustering tool (whose name escapes
> me at the moment), but I haven't arrived at a conclusion as to which
> would be better.
> I believe the GMOD schema (called chado) is sufficiently flexible to
> work with cDNA data versus genome data. For web access, I am preparing
> a first release of gmod-web, which is a front end for chado, and it is
> agnostic with respect to the type of data. With gmod-web and GBrowse, I
> think you would have all the basic tools for browsing your data. (While
> you may not need all of GBrowse, the tools inside GBrowse for generating
> pictures can be incorporated into pages in gmod-web.)
> For annotation, what would you rather be using? I am aware of Artemis,
> but it seems to me that it is just as genome-centric, although I suspect
> that either Apollo or Artemis could be used to annotate cDNA type data.
> (Though it would be best to ask people closer to those projects, as I
> don't really know.) Another annotation tool, JavaSEAN, is in
> development as a GMOD project. I don't know its suitability for cDNA
> based data--again, it might be best to ask the author:
> As for an annotation pipeline, what do you have in mind? It may be that
> such a thing is beyond the scope of GMOD, but it really depends on what
> you are looking for. Along these lines it may be worthwhile to ask the
> FlyBase people what they are doing, since they are the primary
> architects of the chado schema. Even though they are working with
> genome data, the pipeline may still be adaptable to working on cDNA
> So, what have I missed (that is, components that you would need, but
> don't see as part of GMOD)?
> As for your concern about piecing it all together, I share your
> concern. One of my primary tasks is to make it as easy as possible.
> Future releases of gmod will include more and more pieces integrated in
> a fairly seamless way (including Apollo and gmod-web in the near term,
> and PubSearch/Fetch/Track longer term).
> The only other project I know of that is similar to GMOD is GUSdb:
> but I don't know if it is any more or less suitable for your project.
> On Thu, 2004-07-22 at 14:59, Patrick McConnell wrote:
> > We at Duke Cancer Center have developed a system for machine processing,
> > annotation, storage, and public website of cDNA data. We developed this
> > for a couple of very specific projects. Now, we are considering
> > generalizing the system. We are looking at existing systems to handle
> > data, and GMOD has caught our eye.
> > What role do you see GMOD playing in gene annotation? I see there are a
> > bunch of modules that all play slightly different roles. I feel like it
> > would be difficult to piece them all together. Will GMOD ever provide an
> > end-to-end annotation system? I see nothing along the lines of an
> > annotation pipeline. Where does this fit in GMOD? Also, many of the
> > modules (e.g. Apollo and GBrowse) are genome-centric. However, we are
> > interested in cDNA annotation unrelated to genome annotation. Do you see
> > projects like this fitting into GMOD? Finally, can you recommend any
> > similar efforts for me to look into?
> > Thanks for your time,
> > -Patrick McConnell
> > Duke Bioinformatics Shared Resource
> > Duke Comprehensive Cancer Center
> > patrick.mcconnell@...
> Scott Cain, Ph. D. cain@...
> GMOD Coordinator (http://www.gmod.org/) 216-392-3087
> Cold Spring Harbor Laboratory
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> Gmod-devel mailing list
Scott Cain, Ph. D. cain@...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory