Menu

Home

htchu

Welcome to your wiki!

This is the default page, edit it as you see fit. To add a page simply reference it within brackets, e.g.: [SamplePage].

The wiki uses Markdown syntax.


Discussion

  • htchu

    htchu - 2011-11-23

    ======================================================================================

    • EBAR Denovo Assembler - EBARDenovo(Beta) version 1.0.1
      *
    • Copyright (c) Hsueh-Ting Chu (htchu.taiwan@gmail.com)
      *
    • 4F., No.286, Defu Rd., South Dist., Taichung City 402, Taiwan.
    • All rights reserved.
      *
    • This file is a part of the EBAR Denovo Assembler.
    • The use and distribution terms for this software are covered by the
    • Common Public License 1.0 (http://opensource.org/licenses/cpl1.0.php).
    • By using this software in any fashion, you are agreeing to be bound by
    • the terms of this license. You must not remove this notice, or
    • any other, from this software.
    • 2012/01/18


      (0) Quick usage
      Using test data in the sample subdirectory:

    Demo command: EBARDenovo sample\sample_1.fastq.gz sample\sample_2.fastq.gz -o sample.fa

    (0.a) Two stages of assembly

    First stage : building the indexing data with (-a 1) parameters
    EbarDenovo -a 1 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa
    Second stage: begining of assembly. Try different paramenters to optimize results without rebuilding indices.
    EbarDenovo -a 3 -v -l -d sss SRRxxxxxx_1.fastq SRRxxxxxx_2.fastq -o xxxxxx.fa

    (0.b) Platform command styles

    (On Windows Platform)
    EbarDenovo inputfile1 inputfile2 -o outputfile
    (On Linux/MacOS Platform)
    mono EbarDenovo.exe inputfile1 inputfile2 -o outputfile


    (2) Display parameters

    EbarDenovo [-l] [-v] inputfile1 inputfile2 -o outputfile

    Note:
    (2.a) -l : no log file, such as run-201101232129.log i.e. the log file began at 2011/01/23 21:29.
    (2.b) -v : no verbose mode. The program will not show each contigs during runing.


    (3) Quality parameters

    EbarDenovo [-k 14] [-n 6] [-M 8] [-e 8] [-c 0] [-m 0.0] inputfile1 inputfile2 [-o outputfile]

    Note:
    (3.a) -k : key size
    (3.b) -n : nail size
    (3.c) -M : minimal overlap between reads
    (3.d) -e : errors per N bp
    (3.e) -c : minimal size of contig
    (3.f) -m : coverage limitation


    (4) Execution parameters

    EbarDenovo [-a 3] [-d ddd] [-T 1] inputfile1 inputfile2 [-o outputfile]

    Note:
    (4.a) -a : action 1: only building index files, 2: save indices before assembly 3: direct assembly without the saving of indices
    (4.b) -d : the directory if index files: In this directory, there will be five intermediate files:
    read.txt (numbering reads), pair.txt (pairing info), indx-kk.txt (key file; kk is key length),
    class.txt (read classes), spots.txt (read spots)
    (4.c) -T : running threads for accelerating assembly.


    (5) Help

    EbarDenovo -h

    Note:
    It will dump the usage of EBARDenovo programs on screen.


    (6) Demo of commands using SRX015869 ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA009/SRA009364

    (1) First pass : building the indexing data with (-a 1) parameters

    EbarDenovo -a 1 -v -l -k 14 -T 4 -d K14 SRR034309_1.fastq SRR034309_2.fastq -o dro.fa

    (2) Second pass :begin assembly. And try to run with different parameters.

    EbarDenovo -a 3 -v -l -k 14 -T 4 -d K14 SRR034309_1.fastq SRR034309_2.fastq -o dro.fa


    (9) Output format

    The output file is formatted as fasta format wich headed as
    the numbered contig with the coverage levels (cl),
    e.g. >Contig1 : 20903.60: 1156bp: 3442: 0.00%

    Contig Number : Coverage: length: startID: assembly progress


    (10) Additional Notes

    (a) You can see the log file that recorded the running procedure.
    (b) If the raw data of RNASeq is 20G per run, the assembly job consumes around 8G~14G memory.
    (c) The testing datasets are Illumina data. SOLID or other sequencing data did not been tested.
    (d) You could assembly up to 40G sequencing data if the computer owns 24G memory.
    (e) This assembler is designed for RNA-Seq data. For DNA-Seq, another assembler is under development.

     

    Last edit: htchu 2012-01-18
  • Martin Smith

    Martin Smith - 2013-04-18

    On a UNIX server, I constantly get a "Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS" during the index building stage. This is a problem with mono, and I have tried compiling with "--with-large-heap=yes" to no avail.

     

    Last edit: Martin Smith 2013-04-18

Log in to post a comment.

MongoDB Logo MongoDB