Download Latest Version GIIRA_01_3.zip (125.8 kB)
Email in envelope

Get an email when there's a new version of GIIRA

Home
Name Modified Size InfoDownloads / Week
Example 2014-02-10
README.txt 2014-06-17 10.1 kB
GIIRA_01_3.zip 2014-06-17 125.8 kB
GIIRA_01_2.zip 2014-05-28 122.5 kB
GIIRA_01_1.zip 2014-04-07 122.5 kB
GIIRA.zip 2014-02-10 121.7 kB
Totals: 6 Items   502.5 kB 0
---------------------------------
GIIRA
---------------------------------

GIIRA is a stand-alone java program to predict genes based on RNA-Seq reads without requiring 
any a-priori knowledge.

---------------------------------
Copyright (c) 2013,
Franziska Zickmann, 
ZickmannF@rki.de, Robert Koch-Institute, Berlin, Germany
Distributed under the GNU Lesser General Public License, version 3.0.

When using GIIRA, please cite the following manuscript:

GIIRA - RNA-Seq Driven Gene Finding Incorporating Ambiguous Reads 
Franziska Zickmann; Martin S. Lindner; Bernhard Y. Renard 
Bioinformatics 2013; doi: 10.1093/bioinformatics/btt577

---------------------------------
INSTALLATION
---------------------------------

GIIRA is designed to run on a linux system with the following minimum requirements for installed software:

- Python (http://www.python.org/), as well as the pysam package (https://pypi.python.org/pypi/pysam)
- Java 7 (http://www.java.com)
- either the CPLEX Optimizer (http://www-01.ibm.com/software/integration/optimization/cplex-optimizer/) 
  or the GLPK solver (http://www.gnu.org/software/glpk/glpk.html)

If you want to use the GLPK solver for the optimization, make sure that the executable "glpsol" is installed in 
a directory included in your path. For CPLEX, the path of the file "cplex.jar" and the cplex Djava.library.path
have to be passed as parameters in each GIIRA run (refer to parameter description below).

To install GIIRA, download the compressed zip folder from https://sourceforge.net/projects/giira/ und unpack the package with: 

> unzip GIIRA.zip

This creates a folder named "GIIRA" in your current directory. This folder includes the executable GIIRA.jar.

To receive the help message of GIIRA, type:

> java -jar GIIRA/GIIRA.jar --help

Note that GIIRA needs several helper scripts to call external programs, these scripts are included 
in the directory GIIRA/scripts. To run GIIRA it is necessary that this folder is always in the same directory as 
the file GIIRA.jar.


---------------------------------
RUN GIIRA - EXAMPLE
---------------------------------

In the following example we assume that the file GIIRA.jar is contained in the directory foo/.
Further, if you have CPLEX installed on your system, we assume that the path to the file "cplex.jar" and 
to the cplex library "Djava.library.path" is foo_CPLEX/.

GIIRA can either be provided with the raw reads and reference and then calls an external mapper to perform the necessary 
alignment, or it is presented an already existing alignment. It is important that mappings that are provided to GIIRA are 
in SAM-Format(http://samtools.sourceforge.net). The SAM file has to be sorted by read names, which can be performed using 
samtools sort and the -n option (see samtools manual).

In the following we show a simple example run with the testdata that are included in the download package in the directory GIIRA/example/.

In this example, we apply GIIRA to a set of 500000 reads and chromosome IV of Saccaromyces cerevisae as a reference genome. 
Both the reads and the reference are provided in the example, as well as a SAM file with an existing alignment.

To run GIIRA with the already existing SAM file:

1. Create a directory for the results, e.g "GIIRA_example"
2a. to call GIIRA when you have CPLEX installed on your system, type:

> java -jar foo/GIIRA.jar -libPath foo_CPLEX -cp foo_CPLEX/cplex.jar -iG GIIRA/example/Scer_chr4.fasta -haveSam GIIRA/example/scer_example_mapping.sam -out GIIRA_example/

2b. Alternatively, you can call GIIRA without CPLEX (using GLPK):

> java -jar foo/GIIRA.jar -iG GIIRA/example/Scer_chr4.fasta -haveSam GIIRA/example/scer_example_mapping.sam -out GIIRA_example/ -opti glpk

---------------------------------

You can also apply GIIRA to the unmapped reads, using either TopHat2(http://tophat.cbcb.umd.edu/) or BWA(http://bio-bwa.sourceforge.net/) to obtain the read mapping 
(either must be installed on your system to run this example, note that in this description we use TopHat2):

1. Create a directory for the results, e.g "GIIRA_example"
2a. to call GIIRA when you have CPLEX installed on your system, type:

> java -jar foo/GIIRA.jar -libPath foo_CPLEX -cp foo_CPLEX/cplex.jar -iG GIIRA/example/Scer_chr4.fasta -iR GIIRA/example/scer_example_reads.fastq -out GIIRA_example/

2b. Alternatively, you can call GIIRA without CPLEX (using GLPK):

> java -jar foo/GIIRA.jar -iG GIIRA/example/Scer_chr4.fasta -iR GIIRA/example/scer_example_reads.fastq -out GIIRA_example/ -opti glpk


---------------------------------
PARAMETERS OF GIIRA
--------------------------------- 
 
General information:
 
1) If you use the CPLEX optimizer to solve the linear program, please provide the absolute path to the cplex library Djava.library.path as well as to the file cplex.jar  
(included in the directory of your CPLEX installation). 

> java -jar GIIRA.jar -cp PATH_TO_CPLEX/cplex.jar -libPath PATH_TO_CPLEX/Djava.library.path

2) Depending on the size of your dataset, you might have to assign more memory to the GIIRA run to avoid an out of memory error.
   To do so, set a higher Xmx value when calling GIIRA, e.g. 3GB (="3000m"):
   
> java -Xmx3000m -jar GIIRA.jar

options:

 -h : help text and exit

 -iG [pathToGenomes] : specify path to directory with genome files in fasta format (it is also possible to address one fasta file directly)

 -iR [pathToRna] : specify path to directory with rna read files in fastq format (it is also possible to address one fastq file directly)

 -out [pathToResults] : specify the absolute path to the directory that shall contain the results files

 -outName [outputName] : specify desired name for output files, DEFAULT: genes

 -haveSam [samfileName]: if a sam file already exists, provide the name, else a mapping is performed. NOTE: the sam file has to be sorted according to read names!
						 (this can be achieved by using the samtools (http://samtools.sourceforge.net/) sort command with option "-n")

 -nT [numberThreads] : specify the maximal number of threads that are allowed to be used, DEFAULT: 1

 -mT [tophat/bwa/bwasw] : specify desired tool for the read mapping, DEFAULT: tophat

 -opti [cplex/glpk] : specify the desired optimization method, either using CPLEX optimizer (cplex, DEFAULT) or glpk solver (glpk)

 -libPath [PATH] : if cplex is the desired optimizer, specify the absolute path to the cplex library Djava.library.path

 -cp [PATH] : if cplex is the desired optimizer, specify the absolute path to the cplex jar file cplex.jar
 
 -mem [int] : specify the amount of memory that cplex is allowed to use (Note: this parameter should be set for large sets of reads with high ambiguity, 
			  e.g. when the number of ambiguous mappings is above 10 million. Specify the amount in MB, e.g. -mem 10000 means 10GB of memory are allowed)

 -maxReportedHits [int] : if using BWA as mapping tool, specify the maximal number of reported hits; DEFAULT: 2 (if the number of hits of an ambiguous 
				          read exceeds this threshold, it is not reported.)

 -prokaryote : if specified, genome is treated as prokaryotic, no spliced reads are accepted, and structural genes are resolved. DEFAULT: False. (Note: if structural genes shall be 
					 resolved, it is necessary to apply CPLEX as the optimizer. To predict genes on prokaryotes without installed CPLEX, do not set this
					 parameter to receive the prediction of coding regions.)

 -minCov [double] : specify the minimum required coverage of the gene candidate extraction; DEFAULT: -1 (If -1, it is estimated from the mapping. Otherwise,
					it is recommended to choose minCov very small, e.g. = 1 to achieve maximum sensitivity.)

 -maxCov [double] : optional maximal coverage threshold, can also be estimated from mapping (DEFAULT) (Note: this parameter should be set by the user if 
					coverages above a certain threshold are not desired. Reads mapping to regions with a coverage higher than maxCov are excluded from the analysis.)

 -endCov [double] : if the coverage falls below this value, the currently open candidate gene is closed. This value can be estimated from the minimum coverage (-1); DEFAULT: -1
					(If this parameter is set by the user, it is recommended to choose endCov small to garantuee higher sensitivity)

 -dispCov [0/1] : if set to 1 (or if minCov is not specified), the minimum coverage and maximum coverage are automatically estimated from the mapping, DEFAULT: 0

 -interval [int] : specify the minimal size of an interval between near candidate genes, if "-1" it equals the read length. DEFAULT: -1 (Note: this parameter directly
				   affects how often nearby candidate regions are merged to one candidate, if it is set to 0, only overlapping regions are merged. If the dataset has an
				   overall low coverage, it can be helpful to set a bigger value for interval because then coverage gaps are covered more frequently.)

 -splLim [double] : specify the minimal coverage that is required to accept a splice site, if (-1) the threshold is equal to minCov, DEFAULT: -1

 -rL [int] : specify read length, otherwise this information is extracted from the SAM file (DEFAULT)
 
 -altCodon [pathToAlternativeCodons] : specify path to txt file with alternative start and stop codons. All codons possible for starts and stops need to be in one line, separated by
									   tabs. For details refer to an example file in the scripts folder.

 -noAmbiOpti : if specified, ambiguous hits are not included in the analysis (and no optimizer is necessary)

 -settingMapper [(list of parameters)] : A comma-separated list of the desired parameters for TopHat or BWA. Please provide
        for each parameter a pair of indicator and value, separated by an equality sign.
        Note that paramters intended for the 3 different parts (indexing, aln, sam) of BWA have to be separated by a lowercase bar
        Example: -settingMapper [-a=is_-t=5,-N_-n=5]
Source: README.txt, updated 2014-06-17