--------------------------
| PRE-REQUIREMENTS |
--------------------------
- bash (>=4.1.2)
- GNU coreutils (>=8.4)
- GNU sed (>=4.2.1)
- GNU make (>=3.81)
- GNU which (>=2.19)
- GNU Awk (>=3.1.7)
- g++ supporting c++14 standard (>=4.9.3)
- flex (>=2.5.35)
- bison (>=2.5)
- trimmomatic (>=0.32)
- bowtie (==1.*)
- bowtie (==2.*, recommended 2.0.2)
- samtools (>=1.6)
- quast (>=4.5)
- seqan (>=2.3.2)
- java-runtime-environment (>=1.7.0)
- nvidia-cuda-toolkit (>=8.0.61)
- GNU tar (>=1.23)
Alternatively one of the following to perform scaffolding:
- soapdenovo (>=2.04)
- sspace (==Basic 2.0)
TRIMMOMATIC_PATH environment variable should be set to trimmomatic root
--------------------------
| COMPILATION |
--------------------------
run: ./grasshopper compile
--------------------------
| INSTALLATION |
--------------------------
For non sudo user run: ./grasshopper install
NOTE: be sure that ${HOME}/bin is in your ${PATH}
For sudo user run: sudo ./grasshopper install
NOTE: be sure that /usr/local/bin is in users' ${PATH}
--------------------------
| USAGE |
--------------------------
Grasshopper consists of six steps: preprocess, build, traverse,
correct, trim and scaffold. To run it use:
grasshopper <step-name> [params...]
first step namely preprocess creates the dataset which is the
directory in which all the files of given run are stored.
All the other steps uses it as the only non-optional parameter.
The simplest grasshopper usage can look as follows:
grasshopper preprocess foo-1.fastq foo-2.fastq
grasshopper build foo
grasshopper traverse foo
grasshopper correct foo
grasshopper trim foo
grasshopper scaffold foo
The name of dataset for given example will be extracted from the reads name.
Path for dataset to be stored (unless user specify it explicitly using -ds
option) is:
${HOME}/grasshopper-data/<dataset-name>
Each step adds consecutive informations to the dataset so you cannot run steps
with random order. However you can run given step another time e.g. with
another set of parameters without re-running each previous step.
If you plan to do so remember that the files in the dataset will be overwritten
so back-them-up if you don't wish to loose them.
preprocess
-ds=<dataset> -- dataset name/path (default: name of the fasta/fastq files)
-sg=<similar-genome> -- (optional) genome file in fasta format to filter reads
-trimpath=<path-to-trimmomatic> -- alternative to TRIMMOMATIC_PATH environment variable
-trimparams=<trimmomatic-params> -- to alter the default Trimmomatic parameters
build
-ws=<window-size> -- sets window size (default: 600)
-sc=<score-cutoff> -- sets score cutoff (default: 50)
-e=<allowed-errors> -- sets tolerance on errors between two reads (default: 0)
-kmer=<k-mer-size> -- sets size of k-mer to compute characteristics (default: 6)
-pc=<characteristics-count> -- count of partial characteristics (default: 3)
-ps=<characteristics-size> -- the size of a single partial characteristic (default: 50)
-awa=<TRUE/FALSE> -- enables enhancement that drastically increases search set of promising pairs to be verified (may be time consuming!) (default: FALSE)
-sli=<size> -- the size of a shortest lexicographical index sequence (default: 20)
traverse
-fs=<forks-sensitivity> – sets the sensitivity of the forks detector (default: 6)
correct
-minconf=<value> – sets tolerance on distant paired-end reads (default: contigs_depth/7)
-maxrefs=<value> – sets the maximum number of distant paired end reads that after being exceeded we consider to add a new cut spot (default: contigs_depth/7)
trim
<no parameters>
scaffold
-m=<sspace/soap2> – chooses scaffolding tool (default: sspace)
-is=<insert-size> – sets insert size of the original paired-end reads file