Name | Modified | Size | Downloads / Week |
---|---|---|---|
readme.txt | 2015-07-06 | 2.5 kB | |
newDNA-Prot.zip | 2014-09-05 | 773.8 MB | |
LICENSE.txt | 2014-09-05 | 2.6 kB | |
Totals: 3 Items | 773.8 MB | 0 |
newDNA-Prot RELEASE NOTES ========================= newDNA-Prot program by Wei Zheng (jlspzw139@sina.com) This program will predict the DNA-binding proteins when input a protein sequence (Fasta format), it is supplied in source code form along with the required data files and run under the windows. Firstly we can download the newDNA-Prot.tar in http://sourceforge.net/projects/newdnaprot/. Unzip this files,the gfortran-windows-20140523.exe needs to be installed in the first. (In this program, the comprehensive feature program is generated by Chen Zhang.) In windows system,open my computer->all programs->Accessories->Windows Powershell->Windows Powershell. In the Windows Powershell, we must run the relevant commands as adminstrator. Then input the 'set-ExecutionPolicy Unrestricted' to get the execuate permissions of the Powershell script. INPUT FILE: sequence_file in fasta format. One sequence per file. EXAMPLE: In current directory, if we will predict a protein sequence, we can run: ./newDNA-Prot.ps1 ./example/1BI4A ./exampleresult.txt If we will predcit the multiple protein sequences, we can run: ./MulnewDNA-Prot.ps1 ./example ./multexampleresult Please see the LICENSE file for the license terms of the software. It is basically free for academic users, but a license fee applies to commercial users. THE PUBLICATION OF RESEARCH USING Our method MUST INCLUDE AN APPROPRIATE CITATION TO THE METHOD: newDNA-Prot: prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation Yanping Zhang, Jun Xu, Wei Zheng, Chen Zhang, Xingye Qiu, Ke Chen, Jishou Ruan SHORT SUMMARY OF THE METHOD It is essential for developing an efficient computational approach to rapidly and reliably identify the DNA-binding proteins in huge protein database, due to they play a pivotal roles in gene regulation. In this study, we obtain the comprehensive features comprise information from primary sequence, evolutionary profiles,predicted secondary structure, RSA information, physicochemical properties, and biological function information. Then the two-stage (mRMR & Wrapper) feature selection method and ensemble learning approach are used to determine the best prediction model.These results showed that our model can improve the prediction accuracy of DNA-binding proteins only basing on the primary sequence.