From: Aaron J. M. <am...@pc...> - 2005-08-04 19:14:41
|
A Prosite entry is a (regular expression or profile-based) sequence motif. So it's none of the above. I believe we are going to handle Prosite (and other InterProScan- related datasets) simply as DbRefs, and not try to store their actual definitions (thus linking out to expasy, etc. as necessary). -Aaron On Aug 4, 2005, at 2:56 PM, Chris Stoeckert wrote: > Hi Sanjeev, > I guess the first thing to decide is if this entry represents a > sequence, a feature, or an annotation. Do you (or anyone else) have > strong opinions on this? Can you send an example entry? > Thanks, > Chris > > On Aug 4, 2005, at 12:35 PM, Kumar, Sanjeev (Contr) wrote: > > >> Hi, >> Now let us figure out which GUS table to used to store PrositeDB >> master data. >> Can any one help me in that please? >> Following type of information it contains: >> ID Identification (Begins each entry; 1 per entry) >> AC Accession number (1 per entry) >> DT Date (1 per entry) >> DE Short description (1 per entry) >> PA Pattern (>1 per entry) >> MA Matrix/profile (>1 per entry) >> RU Rule (>1 per entry) >> NR Numerical results (>1 per entry) >> CC Comments (>=1 per entry) >> DR Cross-references to Swiss-Prot (>1 per entry) >> 3D Cross-references to PDB (>1 per entry) >> DO Pointer to the documentation file (1 per entry) >> >> Any help on this will be appreciated. >> >> >> Thanks >> Sanjeev >> >> -----Original Message----- >> From: gus...@li... >> [mailto:gus...@li...]On Behalf Of >> gus...@li... >> Sent: Tuesday, August 02, 2005 11:09 PM >> To: gus...@li... >> Subject: Gusdev-gusdev digest, Vol 1 #637 - 1 msg >> >> >> Send Gusdev-gusdev mailing list submissions to >> gus...@li... >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> or, via email, send a message with subject or body 'help' to >> gus...@li... >> >> You can reach the person managing the list at >> gus...@li... >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Gusdev-gusdev digest..." >> >> >> Today's Topics: >> >> 1. RE: Loading Prosite DB (Kumar, Sanjeev (Contr)) >> >> --__--__-- >> >> Message: 1 >> Subject: RE: [GUSDEV] Loading Prosite DB >> Date: Tue, 2 Aug 2005 17:04:05 -0400 >> From: "Kumar, Sanjeev \(Contr\)" <San...@ng...> >> To: "Aaron J. Mackey" <am...@pc...> >> Cc: "Jian Lu" <jl...@vb...>, >> <gus...@li...> >> >> So, The PlugIn which you are writing will not be taking care detail = >> Prosite data, right? >> If yes then I will write a separate plugin to load detail Prosite >> master = >> data. >> >> Thanks >> Sanjeev >> >> -----Original Message----- >> From: Aaron J. Mackey [mailto:am...@pc...] >> Sent: Tuesday, August 02, 2005 5:01 PM >> To: Kumar, Sanjeev (Contr) >> Cc: Jian Lu; gus...@li... >> Subject: Re: [GUSDEV] Loading Prosite DB >> >> >> Yes, InterProScan only provides domain analysis results, not the =20 >> actual domain/pattern/motif databases themselves. >> >> -Aaron >> >> On Aug 2, 2005, at 4:54 PM, Kumar, Sanjeev (Contr) wrote: >> >> >> >>> Hi Aaron/Jian, >>> The Interproscan data has only PrositeID and description in >>> it. =20 >>> But to load other information for a prosite ID, we need to load >>> the =20 >>> Prosite data which comes in different format than Interpro. >>> That is what I found, Do you copy? >>> >>> Thanks >>> Sanjeev >>> >>> -----Original Message----- >>> From: Aaron J. Mackey [mailto:am...@pc...] >>> Sent: Tuesday, August 02, 2005 4:43 PM >>> To: Jian Lu >>> Cc: Kumar, Sanjeev (Contr) >>> Subject: Re: [GUSDEV] Loading Prosite DB >>> >>> >>> >>> From http://www.ebi.ac.uk/interpro/README1.html >>> >>> PROSITE patterns. >>> >>> Some biologically significant amino acid patterns can be summarised >>> in the form of regular expressions. >>> >>> ScanRegExp (by Wol...@eb...), Ppsearch (Fuchs, R. >>> 1994) . >>> >>> PROSITE profile. >>> >>> There are a number of protein families as well as functional or >>> structural domains that cannot be detected using patterns due to >>> their extreme sequence divergence; the use of techniques based on >>> weight matrices (also known as profiles) allows the detection of >>> such >>> domains. >>> >>> pfscan from thePftools package (by Phi...@is...). >>> >>> PRINTS. >>> The PRINTS database houses a collection of protein family >>> fingerprints. These are groups of motifs that together are >>> diagnostically more potent than single motifs by making use of the >>> biological context inherent in a multiple-motif method. >>> >>> FingerPRINTScan (Scordis, P. et al. 1999) . >>> >>> PFAM. >>> Pfam is a database of protein domain families. Pfam contains curated >>> multiple sequence alignments for each family and corresponding >>> profile hidden Markov models (HMMs). >>> >>> hmmpfam from theHMMER2.1 package (by Sean Eddy, >>> ed...@ge..., http://hmmer.wustl.edu), >>> DeCypher=99 (TimeLogic) implementation of HMM search. >>> >>> PRODOM. >>> ProDom families are built by an automated process based on a >>> recursive use ofPSI-BLAST homology searches. >>> >>> BlastProDom.pl (by Florence Servant, >>> fse...@to...) >>> =96 a filter on top of theBlast package (Altschul, S. F. et al. >>> 1997) = >>> >>> >> . >> >> >>> >>> SMART. >>> SMART domains are extensively annotated with respect to phyletic >>> distributions, functional class, tertiary structures and >>> functionally >>> important residues. SMART alignments are optimised manually and >>> following construction of corresponding hidden Markov models (HMMs). >>> >>> hmmpfam from theHMMER2.1 package. >>> >>> TIGRFAMs. >>> TIGRFAMs are a collection of protein families featuring curated >>> multiple sequence alignments, Hidden Markov Models (HMMs) and >>> associated information designed to support the automated functional >>> identification of proteins by sequence homology. Classification by >>> equivalog family (see below), where achievable, complements >>> classification by orthologs, superfamily, domain or motif. It >>> provides the information best suited for automatic assignment of >>> specific functions to proteins from large scale genome sequencing >>> projects >>> >>> =D8 hmmpfam from theHMMER2.1 package. >>> >>> Optionally, predictions for coiled-coil, signal peptide cleavage >>> sites (SignalP v2) and TM helices (TMHMM v2) are supported. >>> >>> >>> >>> On Aug 2, 2005, at 4:32 PM, Jian Lu wrote: >>> >>> >>> >>> >>>> I don't think so. Here is the data sheet from InterProScan. >>>> >>>> Kumar, Sanjeev (Contr) wrote: >>>> >>>> >>>> >>>> >>>> >>>>> Hi Aaron/Jian, >>>>> What all types of data we are talking in IterProScan plugin? >>>>> Does it include Prosite data. >>>>> Thanks >>>>> Sanjeev >>>>> >>>>> -----Original Message----- >>>>> From: Jian Lu [mailto:jl...@vb...] >>>>> Sent: Tuesday, August 02, 2005 2:02 PM >>>>> To: Aaron J. Mackey >>>>> Cc: Kumar, Sanjeev (Contr); gus...@li... >>>>> Subject: Re: [GUSDEV] Loading Prosite DB >>>>> >>>>> >>>>> Aaron, >>>>> >>>>> We are also working on InterProScan and other analysis tools. But >>>>> we haven't got a plugin yet. If your plugin is ready, I would like >>>>> to play it. Here is the view that we created for InterProScan. >>>>> Please comment it. Thanks. >>>>> >>>>> -- >>>>> -- VIEW DOTS.INTERPROSCAN >>>>> -- used to store outputs from InterProScan >>>>> -- June 29, 2005 >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> <InterProScan_OUTPUT.pdf> >>>> >>>> >>>> >>>> >>> >>> -- >>> Aaron J. Mackey, Ph.D. >>> Project Manager, ApiDB Bioinformatics Resource Center >>> Penn Genomics Institute, University of Pennsylvania >>> email: am...@pc... >>> office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI) >>> fax: 215-746-6697 >>> postal: Penn Genomics Institute >>> Goddard Labs 212 >>> 415 S. University Avenue >>> Philadelphia, PA 19104-6017 >>> >>> >>> >>> >>> ------------------------------------------------------- >>> SF.Net email is sponsored by: Discover Easy Linux Migration >>> Strategies >>> from IBM. Find simple to follow Roadmaps, straightforward articles, >>> informative Webcasts and more! Get everything you need to get up to >>> speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id=16492&op=CCk >>> _______________________________________________ >>> Gusdev-gusdev mailing list >>> Gus...@li... >>> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >>> >>> >>> >> >> -- >> Aaron J. Mackey, Ph.D. >> Project Manager, ApiDB Bioinformatics Resource Center >> Penn Genomics Institute, University of Pennsylvania >> email: am...@pc... >> office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI) >> fax: 215-746-6697 >> postal: Penn Genomics Institute >> Goddard Labs 212 >> 415 S. University Avenue >> Philadelphia, PA 19104-6017 >> >> >> >> >> >> --__--__-- >> >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> >> >> End of Gusdev-gusdev Digest >> >> >> ------------------------------------------------------- >> SF.Net email is Sponsored by the Better Software Conference & EXPO >> September 19-22, 2005 * San Francisco, CA * Development Lifecycle >> Practices >> Agile & Plan-Driven Development * Managing Projects & Teams * >> Testing & QA >> Security * Process Improvement & Measurement * http://www.sqe.com/ >> bsce5sf >> _______________________________________________ >> Gusdev-gusdev mailing list >> Gus...@li... >> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev >> >> > > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > Practices > Agile & Plan-Driven Development * Managing Projects & Teams * > Testing & QA > Security * Process Improvement & Measurement * http://www.sqe.com/ > bsce5sf > _______________________________________________ > Gusdev-gusdev mailing list > Gus...@li... > https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev > -- Aaron J. Mackey, Ph.D. Project Manager, ApiDB Bioinformatics Resource Center Penn Genomics Institute, University of Pennsylvania email: am...@pc... office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI) fax: 215-746-6697 postal: Penn Genomics Institute Goddard Labs 212 415 S. University Avenue Philadelphia, PA 19104-6017 |