First stage : building the indexing data with (-a 1) parameters
EbarDenovo -a 1 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa
Second stage: begining of assembly. Try different paramenters to optimize results without rebuilding indices.
EbarDenovo -a 3 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa
(0.b) Platform command styles
(On Windows Platform)
EbarDenovo inputfile1 inputfile2 -o outputfile
(On Linux/MacOS Platform)
mono EbarDenovo.exe inputfile1 inputfile2 -o outputfile
Note:
(2.a) -l : no log file, such as run-201101232129.log i.e. the log file began at 2011/01/23 21:29.
(2.b) -v : no verbose mode. The program will not show each contigs during runing.
Note:
(4.a) -a : action 1: only building index files, 2: save indices before assembly 3: direct assembly without the saving of indices
(4.b) -d : the directory if index files: In this directory, there will be five intermediate files:
read.txt (numbering reads), pair.txt (pairing info), indx-kk.txt (key file; kk is key length),
class.txt (read classes), spots.txt (read spots)
(4.c) -T : running threads for accelerating assembly.
(5) Help
EbarDenovo -h
Note:
It will dump the usage of EBARDenovo programs on screen.
(6) Demo of commands using SRX015869 ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA009/SRA009364
(1) First pass : building the indexing data with (-a 1) parameters
The output file is formatted as fasta format wich headed as
the numbered contig with the coverage levels (cl),
e.g. >Contig1 : 20903.60: 1156bp: 3442: 0.00%
Contig Number : Coverage: length: startID: assembly progress
(10) Additional Notes
(a) You can see the log file that recorded the running procedure.
(b) If the raw data of RNASeq is 20G per run, the assembly job consumes around 8G~14G memory.
(c) The testing datasets are Illumina data. SOLID or other sequencing data did not been tested.
(d) You could assembly up to 40G sequencing data if the computer owns 24G memory.
(e) This assembler is designed for RNA-Seq data. For DNA-Seq, another assembler is under development.
Last edit: htchu 2012-01-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
On a UNIX server, I constantly get a "Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS" during the index building stage. This is a problem with mono, and I have tried compiling with "--with-large-heap=yes" to no avail.
Last edit: Martin Smith 2013-04-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
======================================================================================
*
*
*
2012/01/18
(0) Quick usage
Using test data in the sample subdirectory:
Demo command: EBARDenovo sample\sample_1.fastq.gz sample\sample_2.fastq.gz -o sample.fa
(0.a) Two stages of assembly
First stage : building the indexing data with (-a 1) parameters
EbarDenovo -a 1 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa
Second stage: begining of assembly. Try different paramenters to optimize results without rebuilding indices.
EbarDenovo -a 3 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa
(0.b) Platform command styles
(On Windows Platform)
EbarDenovo inputfile1 inputfile2 -o outputfile
(On Linux/MacOS Platform)
mono EbarDenovo.exe inputfile1 inputfile2 -o outputfile
(2) Display parameters
EbarDenovo [-l] [-v] inputfile1 inputfile2 -o outputfile
Note:
(2.a) -l : no log file, such as run-201101232129.log i.e. the log file began at 2011/01/23 21:29.
(2.b) -v : no verbose mode. The program will not show each contigs during runing.
(3) Quality parameters
EbarDenovo [-k 14] [-n 6] [-M 8] [-e 8] [-c 0] [-m 0.0] inputfile1 inputfile2 [-o outputfile]
Note:
(3.a) -k : key size
(3.b) -n : nail size
(3.c) -M : minimal overlap between reads
(3.d) -e : errors per N bp
(3.e) -c : minimal size of contig
(3.f) -m : coverage limitation
(4) Execution parameters
EbarDenovo [-a 3] [-d ddd] [-T 1] inputfile1 inputfile2 [-o outputfile]
Note:
(4.a) -a : action 1: only building index files, 2: save indices before assembly 3: direct assembly without the saving of indices
(4.b) -d : the directory if index files: In this directory, there will be five intermediate files:
read.txt (numbering reads), pair.txt (pairing info), indx-kk.txt (key file; kk is key length),
class.txt (read classes), spots.txt (read spots)
(4.c) -T : running threads for accelerating assembly.
(5) Help
EbarDenovo -h
Note:
It will dump the usage of EBARDenovo programs on screen.
(6) Demo of commands using SRX015869 ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA009/SRA009364
(1) First pass : building the indexing data with (-a 1) parameters
EbarDenovo -a 1 -v -l -k 14 -T 4 -d K14 SRR034309_1.fastq SRR034309_2.fastq -o dro.fa
(2) Second pass :begin assembly. And try to run with different parameters.
EbarDenovo -a 3 -v -l -k 14 -T 4 -d K14 SRR034309_1.fastq SRR034309_2.fastq -o dro.fa
(9) Output format
The output file is formatted as fasta format wich headed as
the numbered contig with the coverage levels (cl),
e.g. >Contig1 : 20903.60: 1156bp: 3442: 0.00%
Contig Number : Coverage: length: startID: assembly progress
(10) Additional Notes
(a) You can see the log file that recorded the running procedure.
(b) If the raw data of RNASeq is 20G per run, the assembly job consumes around 8G~14G memory.
(c) The testing datasets are Illumina data. SOLID or other sequencing data did not been tested.
(d) You could assembly up to 40G sequencing data if the computer owns 24G memory.
(e) This assembler is designed for RNA-Seq data. For DNA-Seq, another assembler is under development.
Last edit: htchu 2012-01-18
On a UNIX server, I constantly get a "Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS" during the index building stage. This is a problem with mono, and I have tried compiling with "--with-large-heap=yes" to no avail.
Last edit: Martin Smith 2013-04-18