EBARDenovo Home

Highly-accurate de novo assembler of paired-end RNA-Seq

Brought to you by: htchu

Home

Project Admins:

htchu

======================================================================================
You need .Net 4.0 on Windows: http://www.microsoft.com/download/en/details.aspx?id=17718
Or Mono on Unix/Linux/MacOS:http://www.go-mono.com/mono-downloads/download.html
======================================================================================
======================================================================================

EBAR Denovo Assembler - EBARDenovo(Beta) version 1.1.1
*
Copyright (c) Hsueh-Ting Chu (htchu.taiwan@gmail.com)
*
4F., No.286, Defu Rd., South Dist., Taichung City 402, Taiwan.
All rights reserved.
*
This file is a part of the EBAR Denovo Assembler.
The use and distribution terms for this software are covered by the
Common Public License 1.0 (http://opensource.org/licenses/cpl1.0.php).
By using this software in any fashion, you are agreeing to be bound by
the terms of this license. You must not remove this notice, or
any other, from this software.
2012/05/27

(0) Quick usage
Using test data in the sample subdirectory:

Demo command: EBARDenovo sample\sample_1.fastq.gz sample\sample_2.fastq.gz -o sample.fa

(0.a) Two stages of assembly

First stage : building the indexing data with (-a 1) parameters
EbarDenovo -a 1 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa
Second stage: begining of assembly. Try different paramenters to optimize results without rebuilding indices.
EbarDenovo -a 3 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa

(0.b) Platform command styles

(On Windows Platform)
EbarDenovo inputfile1 inputfile2 -o outputfile
(On Linux/MacOS Platform)
mono EbarDenovo.exe inputfile1 inputfile2 -o outputfile

(2) Display parameters

EbarDenovo [-l] [-v] inputfile1 inputfile2 -o outputfile

Note:
(2.a) -l : no log file, such as run-201101232129.log i.e. the log file began at 2011/01/23 21:29.
(2.b) -v : no verbose mode. The program will not show each contigs during runing.

(3) Quality parameters

EbarDenovo [-k 15] [-c 0] [-n 10] [-e 8] inputfile1 inputfile2 [-o outputfile]

Note:
(3.a) -k : key size
(3.e) -c : minimal size of contig
(3.b) -n : nail size
(3.d) -e : errors per N bp

(5) output parameters

EbarDenovo [-G] [-P] [-O 24] [-L] inputfile1 inputfile2 [-o outputfile]

Note:
(5.a) -G : output information for contig/gene groups to xxx-groups.txt.
(5.b) -P : output SNPs of contigs to xxx-snps.txt.
(5.c) -O : output small overlaps inside contigs to xxx-overlaps.fa
(5.d) -L : output chimeric segments to xxx-overlaps.fa

(6) Execution parameters

EbarDenovo [-a 3] [-d ddd] [-T 1] inputfile1 inputfile2 [-o outputfile]

Note:
(6.a) -a : action 1: only building index files, 2: save indices before assembly 3: direct assembly without the saving of indices
(6.b) -d : the directory if index files: In this directory, there will be five intermediate files:
read.txt (numbering reads), pair.txt (pairing info), indx-kk.txt (key file; kk is key length),
class.txt (read classes), spots.txt (read spots)
(6.c) -T : running threads for accelerating assembly.

(7) Help

EbarDenovo -h

Note:
It will dump the usage of EBARDenovo programs on screen.

(8) Demo of commands using public data (SRA ID=SRR166809) ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA030/SRA030942

(1) First pass : building the indexing data with (-a 1) parameters

EbarDenovo -a 1 -d 809k14 SRR166809_1.fastq SRR166809_2.fastq -o dmel-809.fa

(2) Second pass :begin the assembly, and try to adjust different parameters for optimization.

EbarDenovo -a 3 -O -P -G -d 809k14 SRR166809_1.fastq SRR166809_2.fastq -o dmel-809.fa

(9) Output format

The output file is formatted as fasta format wich headed as
the numbered contig with the coverage levels (cl),
e.g. >Contig1 : 20903.60: 1156bp: 3442: 0.00%

Contig Number : Coverage: length: startID: assembly progress

(10) Additional Notes

(a) You can see the log file that recorded the running procedure.
(b) If the raw data of RNASeq is 20G per run, the assembly job consumes around 8G~14G memory.
(c) The testing datasets are Illumina data. SOLID or other sequencing data did not been tested.
(d) You could assembly up to 100G sequencing data if the computer owns 32G memory.
(e) This assembler is designed for RNA-Seq data. For DNA-Seq, another assembler is under development.