RE: [GUSDEV] Loading Prosite DB

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Aaron/Jian,
   The Interproscan data has only PrositeID and description in it. But =
to load other information for a prosite ID, we need to load the Prosite =
data which comes in different format than Interpro.
That is what I found, Do you copy?

Thanks
Sanjeev=20

-----Original Message-----
From: Aaron J. Mackey [mailto:am...@pc...]
Sent: Tuesday, August 02, 2005 4:43 PM
To: Jian Lu
Cc: Kumar, Sanjeev (Contr)
Subject: Re: [GUSDEV] Loading Prosite DB

 From http://www.ebi.ac.uk/interpro/README1.html

PROSITE patterns.

Some biologically significant amino acid patterns can be summarised =20
in the form of regular expressions.

ScanRegExp (by Wol...@eb...), Ppsearch (Fuchs, R. =20
1994) .

PROSITE profile.

There are a number of protein families as well as functional or =20
structural domains that cannot be detected using patterns due to =20
their extreme sequence divergence; the use of techniques based on =20
weight matrices (also known as profiles) allows the detection of such =20
domains.

   pfscan from thePftools package (by Phi...@is...).

PRINTS.
The PRINTS database houses a collection of protein family =20
fingerprints. These are groups of motifs that together are =20
diagnostically more potent than single motifs by making use of the =20
biological context inherent in a multiple-motif method.

      FingerPRINTScan (Scordis, P. et al. 1999) .

  PFAM.
Pfam is a database of protein domain families. Pfam contains curated =20
multiple sequence alignments for each family and corresponding =20
profile hidden Markov models (HMMs).

      hmmpfam from theHMMER2.1 package (by Sean Eddy, =20
ed...@ge..., http://hmmer.wustl.edu),
DeCypher=99 (TimeLogic) implementation of HMM search.

  PRODOM.
ProDom families are built by an automated process based on a =20
recursive use ofPSI-BLAST homology searches.

      BlastProDom.pl (by Florence Servant, fse...@to...) =20
=96 a filter on top of theBlast package (Altschul, S. F. et al. 1997) .

  SMART.
SMART domains are extensively annotated with respect to phyletic =20
distributions, functional class, tertiary structures and functionally =20
important residues. SMART alignments are optimised manually and =20
following construction of corresponding hidden Markov models (HMMs).

      hmmpfam from theHMMER2.1 package.

TIGRFAMs.
TIGRFAMs are a collection of protein families featuring curated =20
multiple sequence alignments, Hidden Markov Models (HMMs) and =20
associated information designed to support the automated functional =20
identification of proteins by sequence homology. Classification by =20
equivalog family (see below), where achievable, complements =20
classification by orthologs, superfamily, domain or motif. It =20
provides the information best suited for automatic assignment of =20
specific functions to proteins from large scale genome sequencing =20
projects

=D8     hmmpfam from theHMMER2.1 package.

Optionally, predictions for coiled-coil, signal peptide cleavage =20
sites (SignalP v2) and TM helices (TMHMM v2) are supported.

On Aug 2, 2005, at 4:32 PM, Jian Lu wrote:

> I don't think so. Here is the data sheet from InterProScan.
>
> Kumar, Sanjeev (Contr) wrote:
>
>
>> Hi Aaron/Jian,
>>   What all types of data we are talking in IterProScan plugin?
>>  Does it include Prosite data.
>> Thanks
>> Sanjeev
>>
>> -----Original Message-----
>> From: Jian Lu [mailto:jl...@vb...]
>> Sent: Tuesday, August 02, 2005 2:02 PM
>> To: Aaron J. Mackey
>> Cc: Kumar, Sanjeev (Contr); gus...@li...
>> Subject: Re: [GUSDEV] Loading Prosite DB
>>
>>
>> Aaron,
>>
>> We are also working on InterProScan and other analysis tools. But =20
>> we haven't got a plugin yet. If your plugin is ready, I would like =20
>> to play it. Here is the view that we created for InterProScan. =20
>> Please comment it. Thanks.
>>
>> --
>> -- VIEW DOTS.INTERPROSCAN
>> -- used to store outputs from InterProScan
>> -- June 29, 2005
>>
>>
>
>
> <InterProScan_OUTPUT.pdf>
>

--
Aaron J. Mackey, Ph.D.
Project Manager, ApiDB Bioinformatics Resource Center
Penn Genomics Institute, University of Pennsylvania
email:  am...@pc...
office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI)
fax:    215-746-6697
postal: Penn Genomics Institute
         Goddard Labs 212
         415 S. University Avenue
         Philadelphia, PA  19104-6017