Met-predictor RELEASE NOTES
===========================
Met-predictor program
by Qiqige Wuyun and Wei Zheng (wuyunqqg@163.com or jlspzw139@sina.com)
This predictor is developed to predict lysine and arginice methylation sites based on
support vector machine (SVM) classifier. It is supplied in source code form along with the
required data files and run under the linux. The input is a protein sequence file (fasta format)
How to use it?
Firstly we can download the Met-predictor.zip in http://sourceforge.net/p/met-predictor
===================================================================================================================================================
===================================================================================================================================================
We give the version 64bit binaries.
step 1. Install
To run Met-predictor, you need to download and install:
gfortan
python
numpy
scipy
bioperl
tcsh
We incorporated some used tools:
/mono [you should compile it according to its README]
for sequence
/lib/blast-2.2.26
/lib/disopred [you had better compile it according to its README]
/lib/hhsuite-2.0.16-linux-x86_64
/lib/HSE
/lib/libsvm-3.14 [you had better compile it according to its README]
/lib/psipred3.3 [you had better compile it according to its README]
/lib/SPIDER2_local
/lib/spineX [you had better compile it according to its README]
for structure
/lib/hhsuite-2.0.16-linux-x86_64/scripts/hhpred [you had better compile it according to its README]
modeller has been included in hhpred, while you MUST go to https://salilab.org/modeller/registration.html
for licence
/lib/PfamScan [you had better compile it according to its README]
/lib/hmmer
/lib/depth-1.0
/lib/Structure for NACCESS CHOPS HSE L1depth kthCH DSSP
Step 2. Change variable
You should change the following paths:
1. In Run_Metprodictor.py
The line: "os.environ['Met_predictor_HOME']='/nfs/amino-home/zhengwei/wuyunqqg/Met-predictor';"
$Met_predictor_HOME should change to your own path.
2. In scripts/GetFeature.sh
The line: "setenv METHOME /nfs/amino-home/zhengwei/wuyunqqg/Met-predictor"
$METHOME should change to your own path.
3. In scripts/GetStructureFeature.sh
The line: "set METHOME = /nfs/amino-home/zhengwei/wuyunqqg/Met-predictor"
$METHOME should change to your own path.
4. In scripts/update-hhsearchpdb70.py
The line: " os.environ['METHOME']='/nfs/amino-home/Met-Predictor' "
$METHOME should change to your own path.
Step 3. Download database
You should put complied nr database to db/blast_nr/nr
nr database download wbsite ftp://ftp.ncbi.nlm.nih.gov/blast/db/
the uniprot20 database to db/hhblits_db/uniprot20_2013_03
or db/hhblits_db/uniprot20_2015_06 according to your downloaded uniprot20database
uniprot20 database download wbsite http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/
the hhsearch database to db/hhsearch_db/
you can download hhsearch/hhpred database by /scripts/update-hhsearchpdb70.py
the pfam database to db/pfam/
you can download Pfam-A database from ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam31.0/
Step 4. Input format and Command
usage: [python] Run_Metpredictor.py -i input_fasta -o outfile -t type [-s structure -r isscale]
-i:Input fasta files:
Your should use the absolute path and the suffix of the filename should be .fasta
format:
>xxxxxx
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
For example: P0ADN2.fasta
>P0ADN2
MAESFTTTNRYFDNKHYPRGFSRHGDFTIKEAQLLERHGYAFNELDLGKREPVTEEEKLFVAVCRGEREPVTEAERVWSKYMTRIKRPKRFHTLSGGKPQVEGAEDYTDSDD
-o:Output file
Your should use the absolute path
-t: residue type [K R]
K for lysine or R for arginine
-s: Adding structure features or not [1 0]
1 for adding structure features; 0 for not
-r: scaling or not in SVM [1 0]
1 for scaling; 0 for not
example: ./Run_Metpredictor.py -i /home/Met-predictor/example/P0CX53.fasta -o /home/Met-predictor/example/P0CX53.out -t R -s 1 -r 1
Notice that the input fasta must use full absolute path!
Step5. Result
The instruction of result file:
The first column is the location index of the predicted lysine (K) residue located in the sequence.
The second column is the predicted label of lysine residue, +1 represents the lysine is predicted as positive sample (i.e., acetylation site) while the -1 represents the lysine is predicted as negative sample (i.e., non-acetylation site)
The third column is the predicted probability of positive sample
The 4-th column is the predicted probability of negative sample
The 5-th column is the predicted label of lysine residue, +1 represents the lysine is predicted as positive MONO-methylation sample (i.e., acetylation site) while the -1 represents the lysine is predicted as negative MONO-methylation sample (i.e., non-acetylation site)
The 6-th column is the predicted probability of positive MONO-methylation sample
The 7-th column is the predicted probability of negative MONO-methylation sample
The 8-th column is the predicted label of lysine residue, +1 represents the lysine is predicted as positive DI-methylation sample (i.e., acetylation site) while the -1 represents the lysine is predicted as negative DI-methylation sample (i.e., non-acetylation site)
The 9-th column is the predicted probability of positive DI-methylation sample
The 10-th column is the predicted probability of negative DI-methylation sample
The 11-th column is the predicted label of lysine residue, +1 represents the lysine is predicted as positive TRI-methylation sample (i.e., acetylation site) while the -1 represents the lysine is predicted as negative TRI-methylation sample (i.e., non-acetylation site)
The 12-th column is the predicted probability of positive TRI-methylation sample
The 13-th column is the predicted probability of negative TRI-methylation sample
===================================================================================================================================================
===================================================================================================================================================
Please see the LICENSE file for the license terms of the software. It is
basically free for academic users, but a license fee applies to commercial
users.
THE PUBLICATION OF RESEARCH USING Our method MUST INCLUDE AN APPROPRIATE
CITATION TO THE METHOD:
Improved Protein Methylation Sites Prediction Based on a Large Variety of Structure Features Set
Wei Zheng, Qiqige Wuyun, Micah Cheng, Gang Hu and Yanping Zhang
OTHERS:
The DISOPRED server for the prediction of protein disorder.
Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT (2004) Bioinformatics 20: 2138-2139.
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.
Remmert M, Biegert A, Hauser A, Soding J (2012) Nat Meth 9: 173-175.
Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins.
Heffernan R, Dehzangi A, Lyons J, Paliwal K, Sharma A, et al. (2015) Bioinformatics (Oxford, England).
The PSIPRED protein structure prediction server.
McGuffin LJ, Bryson K, Jones DT (2000) Bioinformatics 16: 404-405.
SPINE X: Improving protein secondary structure prediction by multi-step learning coupled with prediction of solvent accessible surface area and backbone torsion angles.
E. Faraggi, T. Zhang, Y. Yang, Kurgan L, Zhou Y (2002) J Comput Chem 33: 259-267.
Residue depth: a novel parameter for the analysis of protein structure and stability,
Chakravarty, S. and Varadarajan, R. (1999) Structure, 7, 723-732.
Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins,
Heffernan, R., et al. (2015) Bioinformatics (Oxford, England).
Computer Program, Department ofBiochemistry and Molecular Biology,
Hubbard, S.J. and Thornton, J.M. (1993) NACCESS. University College London.
AAindex: amino acid index database, progress report 2008,
Kawashima, S., et al. (2008) Nucleic Acids Research, 36, D202-D205.
Analysis of Conformational B-Cell Epitopes in the Antibody-Antigen Complex Using the Depth Function and the Convex Hull,
Zheng, W., et al. (2015) PLoS ONE, 10, e0134835.
The Pfam protein families database,
Punta, M., et al. (2012) Nucleic Acids Research, 40, D290-D301.
Accelerated profile HMM searches.
Eddy SR. PLoS Comput Biol. 7:e1002195 (2011)
Automatic Prediction of Protein 3D Structures by Probabilistic Multi-template Homology Modeling.
Meier A., Söding J. (2015) PLoS Comput Biol. 11(10):e1004343. doi: 10.1371/journal.pcbi.1004343. PMID: 26496371