Demo command: EBARDenovo sample\sample_1.fastq.gz sample\sample_2.fastq.gz -o sample.fa
First stage : building the indexing data with (-a 1) parameters
EbarDenovo -a 1 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa
Second stage: begining of assembly. Try different paramenters to optimize results without rebuilding indices.
EbarDenovo -a 3 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa
(On Windows Platform)
EbarDenovo inputfile1 inputfile2 -o outputfile
(On Linux/MacOS Platform)
mono EbarDenovo.exe inputfile1 inputfile2 -o outputfile
EbarDenovo [-l] [-v] inputfile1 inputfile2 -o outputfile
Note:
(2.a) -l : no log file, such as run-201101232129.log i.e. the log file began at 2011/01/23 21:29.
(2.b) -v : no verbose mode. The program will not show each contigs during runing.
EbarDenovo [-k 15] [-c 0] [-n 10] [-e 8] inputfile1 inputfile2 [-o outputfile]
Note:
(3.a) -k : key size
(3.e) -c : minimal size of contig
(3.b) -n : nail size
(3.d) -e : errors per N bp
EbarDenovo [-G] [-P] [-O 24] [-L] inputfile1 inputfile2 [-o outputfile]
Note:
(5.a) -G : output information for contig/gene groups to xxx-groups.txt.
(5.b) -P : output SNPs of contigs to xxx-snps.txt.
(5.c) -O : output small overlaps inside contigs to xxx-overlaps.fa
(5.d) -L : output chimeric segments to xxx-overlaps.fa
EbarDenovo [-a 3] [-d ddd] [-T 1] inputfile1 inputfile2 [-o outputfile]
Note:
(6.a) -a : action 1: only building index files, 2: save indices before assembly 3: direct assembly without the saving of indices
(6.b) -d : the directory if index files: In this directory, there will be five intermediate files:
read.txt (numbering reads), pair.txt (pairing info), indx-kk.txt (key file; kk is key length),
class.txt (read classes), spots.txt (read spots)
(6.c) -T : running threads for accelerating assembly.
EbarDenovo -h
Note:
It will dump the usage of EBARDenovo programs on screen.
(1) First pass : building the indexing data with (-a 1) parameters
EbarDenovo -a 1 -d 809k14 SRR166809_1.fastq SRR166809_2.fastq -o dmel-809.fa
(2) Second pass :begin the assembly, and try to adjust different parameters for optimization.
EbarDenovo -a 3 -O -P -G -d 809k14 SRR166809_1.fastq SRR166809_2.fastq -o dmel-809.fa
The output file is formatted as fasta format wich headed as
the numbered contig with the coverage levels (cl),
e.g. >Contig1 : 20903.60: 1156bp: 3442: 0.00%
Contig Number : Coverage: length: startID: assembly progress
(a) You can see the log file that recorded the running procedure.
(b) If the raw data of RNASeq is 20G per run, the assembly job consumes around 8G~14G memory.
(c) The testing datasets are Illumina data. SOLID or other sequencing data did not been tested.
(d) You could assembly up to 100G sequencing data if the computer owns 32G memory.
(e) This assembler is designed for RNA-Seq data. For DNA-Seq, another assembler is under development.