Download Latest Version lncRScan-SVM_v1.0.1.tar.gz (2.5 MB)
Email in envelope

Get an email when there's a new version of lncRScan-SVM

Home
Name Modified Size InfoDownloads / Week
lncRScan-SVM 2015-08-22
ChangeLog 2015-08-22 215 Bytes
README 2015-08-22 10.3 kB
Totals: 3 Items   10.5 kB 1
Package release: lncRScan-SVM (version 1.0.1, Aug 2015)
Author: Sun Lei
Email: leisuncumt(at)yahoo.com
Description: lncRScan-SVM is a python package for lncRNA prediction. 
Licence: lncRScan-SVM is distributed under the GNU GENERAL PUBLIC (GPL) licence.
        For other programs used in lncRScan-SVM, please refer to licences in LICENCE folder.
Copyright (C) 2014-2015 Yangzhou University
----------------------------------------------------------------


Contents
--------
   1. Introduction
   2. Package components
   3. Installation
   4. Preparation
   5. Examples


1. Introduction
---------------
lncRScan-SVM is a python package for predicting long non-coding RNAs (lncRNAs) or 
protein coding transcripts using support vector machine (SVM). It depends on several 
third-part programs, including gffread, bigWigAverageOverBed, wigToBigWig, txCdsPredict, 
fetchChromSizes, BioPython and LIBSVM. 

2. Package components
---------------------
--README
--ChangeLog (records of version changes)
--executable
    --bin
        --x86 (includes several third-part binary files used by lncRScan-SVM for 32 bit OS)
            --gffread (a program of Cufflinks for getting a nucleotide sequence
			by reading a GTF/GFF file, "http://cufflinks.cbcb.umd.edu/downloads/")
            --bigWigAverageOverBed (compute average score of big wig over each bed)
            --wigToBigWig (convert ascii format wig file to binary big wig format)
        		("http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/")
            --txCdsPredict (a program for predicting the open reading frame or 
				codon sequence from a query sequence, "http://hgdownload.cse.ucsc.edu/admin/jksrc.zip")
        --x86_64 (contains third-part binary files for 64 bit OS)
    --script (core python scripts of lncRScan-SVM)
        --lncRScan-SVM-train.py (train a SVM model for classifying protein coding transcripts and lncRNAs)
    	--lncRScan-SVM-predict.py (classify protein coding transcripts and lncRNAs, given input transcripts)
        --gtf2bed.py (convert a GTF file to a BED file)
    	--extract_features.py (extract features of transcripts)
    	--features2svm.py (convert the feature file to a libSVM format)
    	--subsvmfeatures.py (extract a subset of features from a standard LIBSVM feature file)
    	--prediction2result.py (generate a file containing preciction results)
    	--extract_GTF_chr.py (regenerate GTF according to chromosome names)
    	# Notes: Users commonly use lncRScan-SVM-train.py and lncRScan-SVM-predict.py.
    --util
    	--fetchChromSizes (fetch chrom.sizes information from UCSC,
    		"http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/")
--conf
    --hg19.conf (a config file for predicting human lncRNAs based on the hg19 genome)
    --mm10.conf (a config file for predicting mouse lncRNAs based on the mm10 genome)
--genome
    --hg19 (contains hg19 genome sequence per chromosome)
    --mm10 (contains mm10 genome sequence per chromosome)
--model
    --hg19
        --hg19.model (a SVM model for predicting human lncRNAs)
        --hg19.model.default (a default model file used to generate hg19.model when conducting ./prepare hg19)
        --hg19.scale.param (a file containing parameters for scaling features derived from hg19 genome)
        --hg19.scale.param.default (a default model file used to generate hg19.scale.param when conducting ./prepare hg19)
    --mm10
        --mm10.model (a SVM model for predicting mouse lncRNAs)
        --mm10.model.default (a default model file used to generate mm10.model when conducting ./prepare mm10)
        --mm10.scale.param (a file containing parameters for scaling features derived from mm10 genome)
        --mm10.scale.param.default (a default model file used to generate mm10.scale.param when conducting ./prepare mm10)
--PhastCons
	--hg19 (This folder contains bigwig files of PhastCons scores for hg19)
	--mm10 (This folder contains bigwig files of PhastCons scores for mm10)
--LICENCE
    --GPL_LICENCE.txt
    --LIBSVM_LICENCE.txt
    --gffread_LICENCE.txt
    --licenseUcscGenomeBrowser.txt
    --BioPython_LICENCE.txt
--test (for testing lncRScan-SVM)
    --test.gtf (for testing lncRScan-SVM prediction)
    --test.sh (for testing lncRScan-SVM prediction)


3. Installation
---------------
Currently lncRScan-SVM can be installed on a Linux/Unix OS by following steps:
(1) extract the compressed source package by running
    $ tar zxvf lncRScan-SVM.tar.gz
    and then get LNCRSCAN_SVM_ROOT by running
    $ pwd
    $ export LNCRSCAN_SVM_ROOT="$PWD"
(2) Add the paths of lncRScan-SVM scripts and binary files to the environment variable $PATH 
    by modifying .bashrc in your home directory:
    First, in the end of file add
        export LNCRSCAN_SVM_ROOT="the directory you have got by 'pwd'"
	export PATH=$PATH:$LNCRSCAN_SVM_ROOT/executable/script:$LNCRSCAN_SVM_ROOT/executable/util
    and then add the directory of binary files to $PATH according to your OS:
    *** on 32 bit OS, add
        export PATH=$PATH:$LNCRSCAN_SVM_ROOT/executable/bin/x86
    *** or on 64 bit OS, add
        export PATH=$PATH:$LNCRSCAN_SVM_ROOT/executable/bin/x86_64
    then save .bashrc and restart the shell or $source .bashrc. Then you can use scripts and bin files everywhere. 
(3) Install third-part programs or dependent packages
--Binary files: txCdsPredict, bigWigAverageOverBed, wigToBigWig and gffread 
	(Note: They have been packaged in lncRScan-SVM/bin/x86 or lncRScan-SVM/bin/x86_64, so you don't 
               need to install them by yourself.)
--Python 2.7 (Script running platform, https://www.python.org/download/releases/2.7/)
--Biopython (a set of freely available tools for biological computation written in Python, 
    http://biopython.org/wiki/Biopython, https://github.com/biopython/biopython)
  Please install Biopython following 'http://biopython.org/DIST/docs/install/Installation.html'. 
--LIBSVM (an integrated software for support vector classification, 
	regression and distribution estimation, http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html)

4. Preparation
---------------
After installation, you should check:
(a) Do you have the genome sequences of hg19 or mm10? If yes, please get the path ( PATH_TO_GENOME ) 
where the genome sequence files (a fasta file per chromosome) are located
    $ export PATH_TO_GENOME="$PWD"
Then enter $LNCRSCAN_SVM_ROOT
    $ cd $LNCRSCAN_SVM_ROOT
Create a soft link to $PATH_TO_GENOME
*** if you have genome sequences of hg19, please execute
    $ ln -s $PATH_TO_GENOME ./genome/hg19 
*** or if you have genome sequences of mm10, please execute
    $ ln -s $PATH_TO_GENOME ./genome/mm10
This step can help save the time downloading the genome sequence by 'prepare.sh'.

(b) Do you have the PhastCons scores of hg19 or mm10? If yes, please get the path ( PATH_TO_PHASTCONS ) 
where the PhastCons scores files (a bigwig file per chromosome) are located
    $ export PATH_TO_PHASTCONS="$PWD"
Then enter $LNCRSCAN_SVM_ROOT
    $ cd $LNCRSCAN_SVM_ROOT
Create a soft link to $PATH_TO_PHASTCONS
*** if you have PhastCons scores of hg19, please execute
    $ ln -s $PATH_TO_PHASTCONS ./PhastCons/hg19
*** or if you have PhastCons scores of mm10, please execute
    $ ln -s $PATH_TO_PHASTCONS ./PhastCons/mm10
This step can help save the time downloading the PhastCons scores.

(c) Prepare an available GTF file by extracting lines of "exon" features from your original GTF
    $ grep exon test.gtf > test2.gtf

After checking (a), (b) and (c), run 
    $ prepare.sh hg19
or
    $ prepare.sh mm10
which helps you prepare
(1) Genome sequence
If you do not have the genome sequence of hg19/mm10, the script can help you download them to 
$LNCRSCAN_SVM_ROOT/genome automatically.    

(2) PhastCons scores
If you do not have the PhastCons scores of hg19/mm10, the script can help you download them to 
$LNCRSCAN_SVM_ROOT/PhastCons automatically. 

(3) two model files
In $LNCRSCAN_SVM_ROOT/model, there are two folders named hg19 and mm10, which contains models for 
hg19 and mm10 respectively, it is worth noting that *.model and *.scale.param are two files used by
lncRScan-SVM, and they can be modified by users after installation. * Please do not delete any *.default files.

(4) a configure file
A configure file can be generated in LNCRSCAN_SVM_ROOT/conf.
An example configure looks like:
-----------------------------------------------------------------
# This configure file includes parameters for LncTs/PCTs prediction
# or SVM model training using lncRScan-SVM

## main variables
# directory of genome sequences
GENOME	/home/sunl/software/lncRScan-SVM/lncRScan-SVM_v1.0.0/./genome/hg19

# directory of PhastCons scores
PHASTCONS	/home/sunl/software/lncRScan-SVM/lncRScan-SVM_v1.0.0/./PhastCons/hg19/

## configure LIBSVM
# set svm-prediction model location
SVM_MODEL	/home/sunl/software/lncRScan-SVM/lncRScan-SVM_v1.0.0/./model/hg19/hg19.model

# set svm-scale parameter location
SVM_PARAM	/home/sunl/software/lncRScan-SVM/lncRScan-SVM_v1.0.0/./model/hg19/hg19.scale.param

# Set LIBSVM_MODE
# 	0: no scaling and no grid searching
# 	1: scaling and no grid searching (default)
# 	2: scaling and grid searching automatically by svm-easy
LIBSVM_MODE	1

# set scaling parameters, including SCALE_L and SCALE_U
SCALE_L	0
SCALE_U	63

# set parameters such as cost and gamma
COST	32768
GAMMA	0.00048828125
------------------------------------------------------------- 


5. Examples
-----------
After the installation and preparation steps, Enter the folder test/, run commands
(1) Example 1 -- LNCT/PCT prediction
    You can conduct lncRNA prediction on test.gtf by running
    $ lncRScan-SVM-predict.py -g test.gtf -c ../conf/hg19.conf -o test_out
    After a few seconds, several result files will be generated in the output directory test_out. 
    You can see prediction result details in file test.prediction.result, as well as other files.
    
(2) Example 2 -- SVM model training and LNCT/PCT prediction
    First prepare your own positive GTF file for PCTs and negative one for LNCTs by
    $ lncRScan-SVM-train.py -p PCTs.gtf -n LNCTs -c ../conf/hg19 -o train_out
    Then, replace the hg19.model and hg19.scale.param with the newly trained hg19.model and 
    hg19.scale.param file in train_out.
    
    
    
Source: README, updated 2015-08-22