KA-predictor RELEASE NOTES
==========================
KA-predictor program
by Qiqige Wuyun and Wei Zheng (wuyunqqg@163.com or jlspzw139@sina.com)
This predictor is developed to predict species-specific lysine acetylation sites based on
support vector machine (SVM) classifier. It is supplied in source code form along with the
required data files and run under the linux. The input is a protein sequence file (fasta format)
How to use it?
Firstly we can download the KA-predictor.zip in http://sourceforge.net/p/ka-predictor
===================================================================================================================================================
===================================================================================================================================================
We give two version 32bit and 64bit binaries, you can choose one suit your OS.
step 1. Install
To run KA-predictor, you need to download and install:
gfortan
python
numpy
tcsh
We incorporated some used tools:
/mono [you should compile it according to its README]
/soft/blast-2.2.26
/soft/disopred [you had better compile it according to its README]
/soft/hhsuite-2.0.16-linux-x86_64
/soft/HSE
/soft/libsvm-3.14 [you had better compile it according to its README]
/soft/psipred3.3 [you had better compile it according to its README]
/soft/SPIDER2_local
/soft/spineX [you had better compile it according to its README]
/soft/Pse-in-One-1.0.2
Step 2. Change variable
You should change the following paths:
1. In Run_KAprodictor.py
The line: "os.environ['KA_predictor_HOME']='/nfs/amino-home/zhengwei/wuyunqqg/KA-predictor';"
$KA_predictor_HOME should change to your own path.
The line: "PtmGetFeatures=HOME+'/bin/x64/PtmGetFeatures.exe';######### if your system is 32bit change x64 to x86 #############"
$PtmGetFeatures should select the folder x86 or x64 according to your system.
2. In features/GetFeature
The line: "set KA_predictor_HOME = /nfs/amino-home/zhengwei/wuyunqqg/KA-predictor"
$KA_predictor_HOME should change to your own path.
Then you should put complied nr database to database/blast_nr/nr
and uniprot20 database to database/hhblits_uniprot20/uniprot20_2013_03
or database/hhblits_uniprot20/uniprot20_2015_06 according to your downloaded uniprot20database
Step 3. Input format
Input fasta files:
Your should use the absolute path and the suffix of the filename should be .fasta
format:
>xxxxxx
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
For example: P0ADN2.fasta
>P0ADN2
MAESFTTTNRYFDNKHYPRGFSRHGDFTIKEAQLLERHGYAFNELDLGKREPVTEEEKLFVAVCRGEREPVTEAERVWSKYMTRIKRPKRFHTLSGGKPQVEGAEDYTDSDD
Step 4. Command
In current directory, if we predict the lysine acetylation sites, we can run Run_KApredictor.py as:
[python] Run_KApredictor.py input_fasta[use_abs_root] species[coli musculus sapiens typhimurium]
example: ./Run_KApredictor.py /home/KA-predictor/example/P0ADN2.fasta sapiens
Notice that the input fasta must use full absolute path!
Step5. Result
The result file will be generated in the current directory, named as SequenceName.out.
The first column is the location index of the predicted lysine (K) residue located in the sequence.
The second column is the predicted label of lysine residue, +1 represents the lysine is predicted as
positive sample (i.e., acetylation site) while the -1 represents the lysine is predicted as negative
sample (i.e., non-acetylation site)
The third column is the predicted probability of positive sample
The fourth column is the predicted probability of negative sample
sample result of P0ADN2.fasta:
SeqIndex Label Pos_prob Neg_prob
15 1 0.924478 0.0755216
30 1 0.607438 0.392562
49 1 0.587106 0.412894
58 -1 0.278201 0.721799
80 1 0.815484 0.184516
86 -1 0.238905 0.761095
89 -1 0.0742586 0.925741
98 -1 0.409898 0.590102
===================================================================================================================================================
===================================================================================================================================================
Please see the LICENSE file for the license terms of the software. It is
basically free for academic users, but a license fee applies to commercial
users.
THE PUBLICATION OF RESEARCH USING Our method MUST INCLUDE AN APPROPRIATE
CITATION TO THE METHOD:
Wuyun, Qiqige, Wei Zheng, Yanping Zhang, Jishou Ruan and Gang Hu. "Improved Species-Specific Lysine Acetylation Site Prediction Based on a Large Variety of Features Set." PloS one 11.5 (2016): e0155370.
OTHERS:
The DISOPRED server for the prediction of protein disorder.
Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT (2004) Bioinformatics 20: 2138-2139.
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.
Remmert M, Biegert A, Hauser A, Soding J (2012) Nat Meth 9: 173-175.
Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins.
Heffernan R, Dehzangi A, Lyons J, Paliwal K, Sharma A, et al. (2015) Bioinformatics (Oxford, England).
The PSIPRED protein structure prediction server.
McGuffin LJ, Bryson K, Jones DT (2000) Bioinformatics 16: 404-405.
SPINE X: Improving protein secondary structure prediction by multi-step learning coupled with prediction of solvent accessible surface area and backbone torsion angles.
E. Faraggi, T. Zhang, Y. Yang, Kurgan L, Zhou Y (2002) J Comput Chem 33: 259-267.
Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.
Liu, B., Liu, F., Wang, X., Chen, J., Fang, L. and Chou, K.-C. (2015) Nucleic Acids Research, W1, W65-W71.