Home / tutorial
Name Modified Size InfoDownloads / Week
Parent folder
PLEK.1.2.readme.txt 2014-06-26 9.3 kB
PLEK_howto_generate_scripts.doc 2014-06-24 27.6 kB
howtobuildmodel1.2_step4_use.mp4 2014-06-24 7.2 MB
howtobuildmodel1.2_step3_build.mp4 2014-06-24 74.6 MB
howtobuildmodel1.2_step2_configure.mp4 2014-06-24 33.6 MB
howtobuildmodel1.2_step1_preprocess.mp4 2014-06-24 12.9 MB
how_to_use_PLEK.1.2.mp4 2014-06-24 21.1 MB
how_to_install_PLEK.1.2.mp4 2014-06-24 13.9 MB
how_to_build_model.1.2.mp4 2014-06-24 24.5 MB
PLEK_1.0_howto_install_and_use.mp4 2014-03-22 7.0 MB
PLEK_1.1_howto_install_and_use.mp4 2014-03-22 40.0 MB
Totals: 11 Items   234.6 MB 0
///////////////////////////////////////////////////////////////////
//                                                               //
//  PLEK - predictor of lncRNAs and mRNAs based on k-mer scheme  //
//  Authors: Aimin Li, Junying Zhang                             //
//  Contacts: LiAiminMail@gmail.com, jyzhang@mail.xidian.edu.cn  //
//  Webcite: https://sourceforge.net/projects/plek/              //
//  Version: 1.2                                                 //
//  Updated on: June 26, 2014                                    //
//                                                               //
///////////////////////////////////////////////////////////////////


=====================
  INTRODUCTION
=====================
We present a tool called PLEK (predictor of long non-coding RNAs and messenger 
RNAs based on k-mer scheme), which uses an improved computational pipeline 
based on k-mer and support vector machine (SVM) to distinguish long non-coding 
RNAs (lncRNAs) from messenger RNAs (mRNAs).
 

=====================
  INSTALLATION
=====================
Pre-requisite: 
1. Linux
2. C/C++ compiler (i.e. gcc, g++)
3. Python 2.5.0 or later versions (http://www.python.org/)

Steps:
1. Download PLEK.1.2.tar.gz from https://sourceforge.net/projects/plek/files/ 
   and decompress it.
   
   $ tar zvxf PLEK.1.2.tar.gz 
   
2. Compile PLEK.

   $ cd PLEK.1.2 
   $ python PLEK_setup.py


=====================
  USAGE AND EXAMPLES
=====================
Usage of PLEK.py -- used to differentiate lncRNAs from mRNAs: 
python PLEK.py -fasta fasta_file -out output_file -thread number_of_threads 
   -minlength min_length_of_sequence -isoutmsg 0_or_1 -isrmtempfile 0_or_1

   -fasta        The name of a fasta file, its sequences are to be predicted.
   
   -out          The file name for the results of prediction. Predicted positive 
                  samples are labeled as "Coding", and negative as "Non-coding".
 
   -thread       (Optional) The number of threads for running the PLEK program. 
                  The bigger this number is, the faster PLEK runs. Default value: 5.
   
   -minlength    (Optional) The minimum length of sequences. The sequences whose 
                  lengths are more than minlength will be processed. Default 
                  value: 200.
			 
   -isoutmsg     (Optional) Output messages to stdout(screen) or not. "0" means 
                  that PLEK be run quietly. "1" means that PLEK outputs the details
                  of processing. Default value: 0.
			 
   -isrmtempfile (Optional) Remove temporary files or not. "0" means that PLEK 
                  retains temporary files. "1" means that PLEK remove temporary 
                  files. Default value: 1.
			  
			  
   
Examples: 
1. $ python PLEK.py -fasta PLEK_test.fa -out predicted -thread 10

   NOTE: To predict the sequences in the 'PLEK_test.fa' file, run the PLEK program with 
   10 threads. The program outputs the predicted sequences in the file 'predicted'. 
   

2. $ python PLEK.py -fasta PLEK_test.fa -out predicted -thread 10 -minlength 300

   NOTE: To predict the sequences in the 'PLEK_test.fa' file, run the PLEK program with 
   10 threads. The program outputs the predicted sequences in the file 'predicted'.
   The sequences with the length of >300nt will be processed (remained).
   

3. $ python PLEK.py -fasta PLEK_test.fa -out predicted -thread 10 -isrmtempfile 0

   NOTE: To predict the sequences in the 'PLEK_test.fa' file, run the PLEK program with 
   10 threads. The program outputs the predicted sequences in the file 'predicted'. 
   The details of PLEK run will output to the files with "predicted" as prefix.
   

4. $ python PLEK.py -fasta PLEK_test.fa -out predicted -thread 10 -isoutmsg 1 

   NOTE: To predict the sequences in the 'PLEK_test.fa' file, run the PLEK program with 
   10 threads. The program outputs the predicted sequences in the file 'predicted'. 
   The details of PLEK run will output to user's screen(stdio).
     
=====================
Usage of PLEK Modelling -- used to build your own classifier with your 
                           training data (mRNA/lncRNA transcripts):
 
python PLEKModelling.py -mRNA mRNAs_fasta -lncRNA lncRNAs_fasta 
   -prefix prefix_of_output -log2c range_of_log2c -log2g range_of_log2g 
   -thread number_of_threads -model model_file -range range_file  
   -minlength min_length_of_sequence -isoutmsg 0_or_1 -isrmtempfile 0_or_1
   -k k_mer -nfold n_fold_cross_validation -isbalanced 0_or_1 
   
   -mRNA          mRNA transcripts used to build predictor, in fasta format.
   
   -lncRNA        lncRNA transcripts used to build predictor, in fasta format.
   
   -prefix        Prefix of the output files.
   
   -log2c        (Optional) The specified range of C parameter for the svm parameter 
                  search. Default value: 0,5,1.  (from, to, by; 0,1,2,3,4,5)   
				  
   -log2g        (Optional) The specified range of G parameter for the svm parameter 
                  search. Default value: 0,-5,-1. (from, to, by; 0,-1,-2,-3,-4,-5) 
				  
   -thread       (Optional) The number of threads for running the PLEKModelling 
                  program. The bigger this number is, the faster PLEKModelling runs.
                  Note that a larger thread number means larger consumption of memory.
                  Default value: 12.
				  
   -model        (Optional) The name of a predictor model file (an output file
                  by PLEKModelling.py).   
				  
   -range        (Optional) The name of a svm range file (an output file by 
                  PLEKModelling.py).   
				  
   -minlength    (Optional) The minimum length of sequences. The sequences whose 
                  lengths are more than minlength will be processed. Default 
                  value: 200.             
				  
   -isoutmsg     (Optional) Output messages to stdout(screen) or not. "0" means 
                 that PLEKModelling be run quietly. "1" means that PLEKModelling 
                 outputs the details of processing. Default value: 0.   
				 
   -isrmtempfile (Optional) Remove temporary files or not. "0" means that PLEKModelling 
                  retains temporary files. "1" means that PLEKModelling remove temporary 
                  files. Default value: 1.
				  
   -k            (Optional) range of k. k=5 means that we will calculate usage of 
                 1364 k-mer patterns. (k=1, 4 patterns; k=2, 16; k=3, 64; k=4, 256; 
                 k=5, 1024; 1364=4+64+256+1024). Default value: 5. 
				 
   -nfold        (Optional) n-fold cross-validation in search for optimal parameters.
                 Default value: 10.   
   
   -isbalanced   (Optional) In the case of isbalanced=1, if the samples are 
                 unbalanced, it will subsample the overrepresented class to obtain an 
                 equal amount of positives and negatives.
                 Default value: 0.
           

Examples: 
1. $ python PLEKModelling.py -mRNA PLEK_mRNAs.fa -lncRNA PLEK_lncRNAs.fa -prefix 20140531 

   NOTE: This trains a classifier using the mRNA sequences in the 'PLEK_mRNAs.fa' file
   and lncRNA in 'PLEK_lncRNAs.fa'. The program outputs the model in the file 
   '20140531.model' and the svm-scale range in '20140531.range'. 
   We can use the new model as follows:
   
    $ python PLEK.py -fasta PLEK_test.fa -out 20140531.predicted -thread 10  \
    -range 20140531.range -model 20140531.model 
	
2. $ python PLEKModelling.py -mRNA PLEK_mRNAs.fa -lncRNA PLEK_lncRNAs.fa -prefix 20140601 \
     -log2c 1,3,1 -log2g -1,-3,-1 -nfold 2 -k 4      

   NOTE: This example is used to demonstrate the usage of PLEKModelling.py
   in a simple/quick way. User can run this to check if our program can run correctly.
   It trains a classifier using the mRNA sequences in the 'PLEK_mRNAs.fa' file
   and lncRNA in 'PLEK_lncRNAs.fa'. The range of log2c is 1,2,3. The range of log2g
   is -1,-2,-3. Use a 2-fold cross-validation. K is 4, it will calculate the usage
   of 340 patterns (4+16+63+256=340). The program outputs the model in the file 
   '20140601.model' and the svm-scale range in '20140601.range'. 
   We can use the new model as follows:
   
    $ python PLEK.py -fasta PLEK_test.fa -out 20140601.predicted -thread 10  \
    -range 20140601.range -model 20140601.model -k 4
   
Notes:
   (1) In general, it is time-consuming to build a new classifier.          
   (2) The accuracy of a classifier model is connected with the quality and 
   quantity of training samples. We encourage users to supply as many reliable 
   samples as possible to PLEKModelling.py to train a classifier. We suggest
   that transcripts annotated with 'pseudogene', 'predicted' and 'putative' 
   be removed before training models.   
   (3) To parallel run PLEKModelling.py in PBS, please see 
   PLEK_howto_generate_scripts.pdf 
   (4) This script is not suitable for organisms with compact genomes.   
 
 
=====================
  COPYING
=====================
GNU Public License version 3 (GPLv3)
Details on http://www.gnu.org/copyleft/gpl.html

Source: PLEK.1.2.readme.txt, updated 2014-06-26