From: Pankaj J. <pj...@co...> - 2004-05-04 15:18:17
|
Hi Don, This is a great beginning and came in at a very appropriate time. We at Gramene are working on resolving some of the issues in our gene centric presentation strategy. I have tried to capture some of the very early thoughts from our side in this document at http://ascus.plbr.cornell.edu/~gramene/gene/template_gene_page_entry.html. Will be working on the updates soon. Even before we begin, I think we need to define some of the very basic concepts that we are going to put together. Thanks Pankaj Don Gilbert wrote: > > Dear organism database folks, > > One outcome of the April 04 GMOD meeting is a working group to decide if > and how we can create a 'Unified Gene Page' for organism databases. You > either volunteered for this group, or were volunteered by someone else > :) There is no required work except that at some point each > represented MOD will want you to vote on how to proceed with group > recommendations. > > I'm hopeful we can accomplish something this summer, and have > suggestions/recommendations by fall 2004 GMOD meeting. Please use the > gmo...@li... for discussion, unless there is a > need for non-public talk. See attached doc for the background and focus > of this group to let you better decide. > > I'd suggest this to start: > > - Focus on biology now, leave computing to later. The major need > is to distill biological knowledge about genes to say what > MODs should be representing in a common way. > > - Look over example MOD gene pages. The approach suggested of > removing HTML to look at content, labels and organization struck a chord > in meeting discussion. XML-izing gene pages is not necessary, > but is one useful way to distill common information and structure. > > - Create a few sample unified pages, and show them > around for comment. We should ask for input from gene page > consumers: scientists who study a few or many genes across > organisms; data miners who use gene pages in bulk (academic, govt., > industry, other databases). > > - Speak up if you want to actively help. I've been working on this > subject since 1999 -- some one else may have better luck in a new > approach at consensus. If you are, or know of, someone with a strong > biology background who uses different MOD gene pages, interested in > organizing this group, please say so. Even if we decide this topic > should be shelved, at least we should gather evidence on why the cost > to unify gene pages is not worth the effort. > > > Would you confirm if you indeed are willing to join this group, if I > have your correct e-mail, affiliation, name, and if it is ok to add > your name/e-mail to public documents? If you are not the best > person for this group, please suggest someone else at your MOD. > > > -- Don Gilbert > > > GMOD: Unified Gene Page Working Group > > TAIR: Leonore Reiser (lr...@ac...; www.arabidopsis.org) > GO: Suzanna Lewis (su...@fr...; geneontology.org) > MGI: Joel Richardson (je...@in...; www.informatics.jax.org) > SGD: Kara Dolinksi (dol...@pr... ; www.yeastgenome.org) > dictyBase: Eric Just (dictybase.org) > RGD: Dean Pasko (dp...@mc...; rgd.mcw.edu) > Wormbase: Raymond Lee (ra...@ca..., wormbase.org) > Gramene: Pankaj Jaiswal (pj...@co...; gramene.org) > EcoCyc/BioCyc: Ingrid Keseler (ke...@ai...) > Zenbase: Jeff Bowes (bo...@uc...; www.xenbase.org) > SGN: Lukas Mueller (la...@co...; www.sgn.cornell.edu) > MaizeGDB: Trent Sigfried (dev...@ia...; maizegdb.org) > FlyBase: Don Gilbert (gil...@in...; flybase.net, eugenes.org) > > NOTE: need email for Eric Just (dictybase.org) > Maybe add locuslink devs; ensembl devs; genecards devs? > ------------------------------------------------ > > > > Toward a Unified Gene Page > GMOD Gene Page Working Group > 2 May 2004 > ------------------------------------------------ > > This group will discuss and propose a common gene page that model > organism/genome database members can agree to produce in some form. All > interested parties can contribute. > > Mail list: > gmo...@li... > > Web home: > http://eugenes.org/all/gene-report-examples/ > (add gmod gene-page project) > > Participants: > (participants from 12+ MODs to be added after confirmation ..) > > > INTRODUCTION > ------------------------------------------------ > There may be a long-standing desire among genome data consumers and > producers to unify the documents describing organism genes (gene pages) > that are provided by many model organism/genome databases (MODs). > There remain questions of whether this desire exceeds costs of effort > to make unified gene pages. > > Discussion of common gene report web pages and software should build on > existing expertise of MODs. These projects have years of experience > working with life scientists to produce gene pages that capture the > essense of knowledge from the databases, and make it understandable and > useful to scientists. For most MODs, the gene page is probably the most > highly used reference document that people come to MOD web sites for. At > FlyBase.net, these account for over a third of all calls, far surpassing > any other single use category. > > > PLAN > ------------------------------------------------ > One outcome of the April 04 GMOD meeting is organization of a working > group to decide how to proceed to reach some consensus on this topic. > Members are drawn from several existing and new MODs, and others > interested in unified gene pages. > > These points need discussion: > * What are common parts of MOD gene pages? > * What could/should be unified? > * Who will benefit? Costs? > * Web Reports AND/OR XML ? > > Suggested starting points: > > - Focus on biology now, leave computing to later. The major need > is to distill biological knowledge about genes to say what > MODs should be representing in a common way. > > - Look over example MOD gene pages. The approach suggested of > removing HTML to look at content, labels and organization struck a chord > in meeting discussion. XML-izing gene pages is not necessary, > but is one useful way to distill common information and structure. > > - Create a few sample unified pages, and show them > around for comment. We should ask for input from gene page > consumers: scientists who study a few or many genes across > organisms; data miners who use gene pages in bulk (academic, govt., > industry, other databases). > > - Speak up if you want to actively help. I've been working on this > subject since 1999 -- some one else may have better luck in a new > approach at consensus. If you are, or know of, someone with a strong > biology background who uses different MOD gene pages, interested in > organizing this group, please say so. Even if we decide this topic > should be shelved, at least we should gather evidence on why the cost > to unify gene pages is not worth the effort. > > As a target goal, a proposal for unified gene pages could be ready for > GMOD meeting in Fall 2004. There is a GMOD mailing list for discussion, > and any documents, sample pages, and software can be deposited at > gmod.sourceforge.net. The euGenes.org gene page service is also > available for such, and test cases. > > > WORKING DOCUMENTS > ------------------------------------------------ > We likely should create a 'gene-page' CVS project at > gmod.sourceforge.net for documents and samples. There are example gene > pages (see below, above), some sample extracted content (HTML -> XML). > DGG has a preliminary 'Gene Page Scraper' that automates in Perl the > by-hand methods I used for removing HTML styles, extracting common > content of MOD gene pages. Useful to gene-data-miners even if not > otherwise, though it likely has a short lifespan as web page design > changes will break it. > > > BACKGROUND (MEOW/euGenes effort) > ------------------------------------------------ > In 1999, attendees of the Model Eukaryote Organism Workshop proposed to > develop a common set of summary gene information. A test website grew > out of this, produced primarily by Bill Gelbart and Don Gilbert. This > was produced by extracting common gene information from existing MODs > public data. Don has continued the MEOW effort at common summary gene > information as euGenes.org, though it never achieved the desired goal of > having each MOD contribute common summary data. The euGenes effort, > without MOD contributions or external funding, has only middling > success at trying to maintain current gene summary pages in the face of > effort needed to match changing genome data. > > The Generic Model Organism Database project (GMOD) arose from related > thoughts (and grant agency spurs for cost-effectiveness) that there > should be common effort at building databases, software tools, and > common practice methods for developing new organism databases, and > updating existing ones. This has been an NIH funded project with many > MOD participants, and as of 2004 is beginning to bear fruit in terms of > commonly usable MOD components. > > With many more organism genome databases coming into being in decade > 2000, the usefulness of having common gene information and documents is > growing, both among database providers and the many customers of these > (from individuals to academic, government and industrial R&D labs, and > other bioinformatics database developers). A hopeful new goal of GMOD is > to tackle the 'Unified Gene Page' question again, with fresh perspective > and see if a new consensus on utility > > Along with MODs, there are numerous bioinformatics web/database services > that have gene-related reports, and can offer useful insights to this > topic. Many of these are found as External_Links on MOD gene pages. > Often these draw the gene summary data from MODs, in ways similar to the > data mining MEOW/euGenes was constructed with, which is another cost of > having > non-unified gene information. > > > FIRST SUMMARY GENE PAGE PROPOSAL > (extract of Model Eukaryote Organism Workshop, W. Gelbart, Feb. 1999 > http://eugenes.org/docs/meow-startup.txt ) > ------------------------------------------------ > ... to establish a common interface for the major model eukaryotic > organism databases. This interface would be offered as the default > homepage for nonspecialists wishing to access any of the participating > databases. ... This home page would provide gene/gene product query > access to based on gene symbol, gene product characteristics, chromosome > position and perhaps homology (if we can agree on criteria to be used). > Our view is that this will satisfy the needs of many nonspecialist users > without forcing them to learn the intricasies and idiosyncracies of each > of our sites. ... We had discussed a design principle that each gene > "page" would contain no more than one screenful of information. > Further, the gene pages should as much as possible focus on > coin-of-the-realm molecular biological terminology rather than > species-specific jargon. I suggest we try to flesh out what a typical > page would contain. Here are some suggestions for relevant fields on > the "gene report page", in order to get a discussion started: > > *Valid terms: > -Gene symbol > -Gene full name > -Gene identifier number > > *Synonyms: > -Symbol synonyms > -Full name synonyms > -Secondary gene identifier numbers > > *Map location information: > -Chromosome and genetic map position > -Molecular map information (simple graphic of DNA length, > encoded transcript(s) and CDS(s) if available) > > *Gene product information: > -Functional information (from function ontology if available) > -Structural information (from InterProt) > -Homology information with other "MEOW" organisms (we would > need to agree on some computable criteria for this field, > which is of course not trivial) > > *Links to extended gene information in the specialty model organism DBs. > *Brief free text gene summary (a few lines at most). > > > APRIL 2004 OUTLINE > ------------------------------------------------ > Example work is at > http://eugenes.org/all/gene-report-examples/ > > Common gene attributes (drawn from existing MOD pages) > * Names, symbols/IDs, synonyms > * Map locations > * Sequences > * Reagents > * Gene ontology > * Similar Genes > * Database cross-refs, External links > * Alleles, Transcripts > * Proteins, Structure and Domains > * Expression and Mutant Phenotypes > * Gene Interactions > * Literature references > * Summary Text > > Common to gene pages? > * Labels - are these same things? > -- Gene / locus / orf > -- Homolog / ortholog / relationship / similarity > -- Citation / publication / reference > * Organization of document > -- Section headers > -- Important at top, common ordering? > * Structure and size of default document > -- Tabular, text, document-like, ... > -- One screen or long report > * Graphics (maps, icons, ...) > * Further Detail options > * Layout and Design (colors, formatting, fonts ..) > > What is customizable? > * MOD customizations > -- Look and Feel > -- Details & Extensions > * Customer choices > -- Best for organism community (org. standard) > -- Best for general reader (general standard) > -- Best for beginners or experts (simple,complex) > > Example Gene Pages > > Common Gene XML? > * Computable text of gene page ? > -- "what you see (web page) is also what your computer can read" > -- simple and human-readable, or complex and detailed > * XML variants, tabular, other? > -- Ace2XML, NCBI XML, others > -- Samples (Web -> XML) > > > COMPARE EXISTING MOD GENE PAGES > ----------------------------------------------------------- > For a start at gathering this common experience into a set of common > values and practice in gene page presentation, I picked a highly > homologous eukaryote gene Calmodulin, and pulled out gene reports on > these from these organism databases: > yeast - SGD > arabidopsis - TAIR > zebrafish - ZFIN > worm - WormBase > rice - Gramene > human - LocusLink > mouse - MGI > rat - RGD > mosquito - Ensembl > fly - FlyBase, euGenes, LocusLink, Ensembl > (see same gene from different viewpoints) > > There are several Calmodulin(-like) genes/org, I just used one from each > source. Find these at http://eugenes.org/all/gene-report-examples/ > or ftp://eugenes.org/eugenes/gene-report-examples/ > > - Don Gilbert > > 24 Apr 2004 - These have been updated to current versions. > As well, four gene pages are translated from HTML to XML, for > cut/paste/edit to design common gene pages. The XML should > (theoretically, with right software/templates) > allow regeneration of the original web pages. > > 5 Sep 2003 - Example gene web page reports from several model > organism databases. > > URLs for the Cam gene pages (Sept 2003; updated Apr 2004 - > alternate reports removed to focus on common summary pages) > --------------------------------------------------------- > > http://flybase.net/cgi-bin/fbidq.html?FBgn0000253 > > http://db.yeastgenome.org/cgi-bin/SGD/locus.pl?locus=S0000313 > > http://www.informatics.jax.org/searches/ accession_report.cgi?id=MGI:88251 > > http://www.arabidopsis.org/servlets/TairObject?type=locus&id=29764 > > http://www.wormbase.org/db/gene/gene?name=cmd-1;class=Locus > > http://rgd.mcw.edu/query/query.cgi?id=2257 > > http://www.gramene.org/perl/protein_search?acc=P29612 > > ## newly inferred zebrafish calm1a gene > http://zfin.org/cgi-bin/ZFIN_jump?record=ZDB-GENE-030131-8308 > > -------- summary services ----- > > http://www.ensembl.org/Anopheles_gambiae/ geneview?gene=ENSANGG00000010211 > http://www.ensembl.org/Drosophila_melanogaster/geneview?gene=CG8472 > > http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=801 > http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=24242 > http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=36329 > http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=12313 > > http://bioinfo.weizmann.ac.il/cards-bin/carddisp?CALM1 > > http://eugenes.org/cgi-bin/moidq.html?FBgn0000253 > http://eugenes.org/cgi-bin/moidq.html?AGgn0010211 > http://eugenes.org/cgi-bin/moidq.html?HUgn0000801 > http://eugenes.org/cgi-bin/moidq.html?MGgn0000995 > http://eugenes.org/cgi-bin/moidq.html?CEgn0016585 > http://eugenes.org/cgi-bin/moidq.html?ATgn0005396 > http://eugenes.org/cgi-bin/fbidq.html?SGgn0000313 > http://eugenes.org/cgi-bin/moidq.html?ZFgn0000878 > > > -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 > -- gil...@in... -- http://marmot.bio.indiana.edu/ > -- ************************ Pankaj Jaiswal, PhD G15-Bradfiled Hall Dept. of Plant Breeding Cornell University Ithaca, NY-14853, USA Tel: +1-607-255-3103 +1-607-255-4109 Fax: +1-607-255-6683 http://www.gramene.org ************************ |