CRUMp Code
A probabilistic prediction system of protein phosphorylation sites
Brought to you by:
mmenor
| File | Date | Author | Commit |
|---|---|---|---|
| README.txt | 2012-06-11 |
|
[5a3d93] First commit |
| crum_models.mat | 2012-06-11 |
|
[5a3d93] First commit |
| phospredict.m | 2012-06-11 |
|
[5a3d93] First commit |
CRUMp for OCTAVE and MATLAB
0. REQUIREMENTS
1) OCTAVE or MATLAB. CRUMp was tested
using OCTAVE 3.2.4 and MATLAB 7.12.0, but CRUMp may work
with older versions.
2) OCTAVE requires the BIOINFO package, available
at http://octave.sourceforge.net/. While MATLAB requires the
Bioinformatics Toolbox from MathWorks.
1. INSTALLATION
1.1 OCTAVE INSTRUCTIONS
OCTAVE users may install CRUMp as a package. Download the
lastest package and use the following OCTAVE command:
pkg install crump-0.2.0.tar.gz
You may need to substitute the version number to the one you
downloaded. If you receive an error that you do not have the
BIOINFO package installed, visit the website
http://octave.sourceforge.net/ and download the latest
BIOINFO package. Install the package using the following
command, replacing with the appropriate version number:
pkg install bioinfo-0.1.2.tar.gz
Alternatively, you may download the zip of the source code
of CRUMp and unpackage it in any directory desired. Then
in OCTAVE you may change to that directory using the "cd"
command, or add the directory to OCTAVE's load path, for
example, using:
addpath('~/myfolder/crump-0.2.0')
1.2 MATLAB INSTRUCTIONS
Download the zip of the CRUMp's source code. Unpackage
the zip in the directory desired. Then in MATLAB, add that
directory to the search path, for example, using the
command:
addpath('~/myfolder/crump-0.2.0')
2. USAGE
The input of CRUMp is a FASTA file of the protein
sequences you want to analyze. To have the results print to
screen, use the following command in OCTAVE or MATLAB:
phospredict myfasta.fasta
If you want to print the results to file, you may
additionally specify the output filename, e.g.:
phospredict myfasta.fasta myoutput.txt
The output report tells you, for each sequence, the position
number of each potential site, the type of site (S, T, or Y)
and the probability that the site is phosphorylatable.
If you are a programmer using the CRUMp in a script, it
may be convenient to use the output cell array using the
command:
results = phospredict('myfasta.fasta', 'myoutput.txt');
The returned cell array consists of sequence structures, one
for each of the n sequences in the FASTA file. Each sequence
structure consists of three site structures: s_sites,
t_sites and y_sites. Finally each sites structure consists
of the following fields:
Name Type Description
--------------------------------------------------------
position cell array Position numbers of sites
site cell array Protein sequence of sites
matrix matrix Kernel matrix of sites
pred array Posterior probabilities
For example, if you want to access the probability that the
first S site in the second sequence in the FASTA is
phosphorylatable:
results{2}.s_sites.pred(1)