EBARDenovo / Wiki / Home

======================================================================================

EBAR Denovo Assembler - EBARDenovo(Beta) version 1.0.1
*
Copyright (c) Hsueh-Ting Chu (htchu.taiwan@gmail.com)
*
4F., No.286, Defu Rd., South Dist., Taichung City 402, Taiwan.
All rights reserved.
*
This file is a part of the EBAR Denovo Assembler.
The use and distribution terms for this software are covered by the
Common Public License 1.0 (http://opensource.org/licenses/cpl1.0.php).
By using this software in any fashion, you are agreeing to be bound by
the terms of this license. You must not remove this notice, or
any other, from this software.
2012/01/18

(0) Quick usage
Using test data in the sample subdirectory:

Demo command: EBARDenovo sample\sample_1.fastq.gz sample\sample_2.fastq.gz -o sample.fa

(0.a) Two stages of assembly

First stage : building the indexing data with (-a 1) parameters
EbarDenovo -a 1 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa
Second stage: begining of assembly. Try different paramenters to optimize results without rebuilding indices.
EbarDenovo -a 3 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa

(0.b) Platform command styles

(On Windows Platform)
EbarDenovo inputfile1 inputfile2 -o outputfile
(On Linux/MacOS Platform)
mono EbarDenovo.exe inputfile1 inputfile2 -o outputfile

(2) Display parameters

EbarDenovo [-l] [-v] inputfile1 inputfile2 -o outputfile

Note:
(2.a) -l : no log file, such as run-201101232129.log i.e. the log file began at 2011/01/23 21:29.
(2.b) -v : no verbose mode. The program will not show each contigs during runing.

(3) Quality parameters

EbarDenovo [-k 14] [-n 6] [-M 8] [-e 8] [-c 0] [-m 0.0] inputfile1 inputfile2 [-o outputfile]

Note:
(3.a) -k : key size
(3.b) -n : nail size
(3.c) -M : minimal overlap between reads
(3.d) -e : errors per N bp
(3.e) -c : minimal size of contig
(3.f) -m : coverage limitation

(4) Execution parameters

EbarDenovo [-a 3] [-d ddd] [-T 1] inputfile1 inputfile2 [-o outputfile]

Note:
(4.a) -a : action 1: only building index files, 2: save indices before assembly 3: direct assembly without the saving of indices
(4.b) -d : the directory if index files: In this directory, there will be five intermediate files:
read.txt (numbering reads), pair.txt (pairing info), indx-kk.txt (key file; kk is key length),
class.txt (read classes), spots.txt (read spots)
(4.c) -T : running threads for accelerating assembly.

(5) Help

EbarDenovo -h

Note:
It will dump the usage of EBARDenovo programs on screen.

(6) Demo of commands using SRX015869 ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA009/SRA009364

(1) First pass : building the indexing data with (-a 1) parameters

EbarDenovo -a 1 -v -l -k 14 -T 4 -d K14 SRR034309_1.fastq SRR034309_2.fastq -o dro.fa

(2) Second pass :begin assembly. And try to run with different parameters.

EbarDenovo -a 3 -v -l -k 14 -T 4 -d K14 SRR034309_1.fastq SRR034309_2.fastq -o dro.fa

(9) Output format

The output file is formatted as fasta format wich headed as
the numbered contig with the coverage levels (cl),
e.g. >Contig1 : 20903.60: 1156bp: 3442: 0.00%

Contig Number : Coverage: length: startID: assembly progress

(10) Additional Notes

(a) You can see the log file that recorded the running procedure.
(b) If the raw data of RNASeq is 20G per run, the assembly job consumes around 8G~14G memory.
(c) The testing datasets are Illumina data. SOLID or other sequencing data did not been tested.
(d) You could assembly up to 40G sequencing data if the computer owns 24G memory.
(e) This assembler is designed for RNA-Seq data. For DNA-Seq, another assembler is under development.

Last edit: htchu 2012-01-18

EBARDenovo Wiki

Highly-accurate de novo assembler of paired-end RNA-Seq

Home

Discussion

2012/01/18