Menu

Tree [1e2981] default tip /
 History

Read Only access


File Date Author Commit
 dockers unknown
 docs unknown
 impl 2018-10-24 Wiktor Kuśmirek Wiktor Kuśmirek [36ea4a] bugfix in package building
 katome unknown
 sandbox 2021-04-27 Marcin Konefal Marcin Konefal [1e2981] Final fixes in thesis and for thesis.
 tags unknown
 .hgignore 2021-04-27 Marcin Konefal Marcin Konefal [1e2981] Final fixes in thesis and for thesis.
 CHANGES 2018-10-23 Wiktor Kuśmirek Wiktor Kuśmirek [ad8753] finished verion of dnaasm tool with dnaasm-link...
 INSTALL unknown
 README.md 2018-10-23 Wiktor Kuśmirek Wiktor Kuśmirek [ad8753] finished verion of dnaasm tool with dnaasm-link...

Read Me

dnaasm

Introduction

dnaasm is an application for analysis NGS data. Firstly, dnaasm contains de novo addembler, which could be used to assemble short DNA reads of highly repetitive genomes. Secondly, mentioned tool could be used to reconstruct long tandem repeats, which could not be restored by another DNA assemblers. This use case could improve investigated organism' genome draft.

What is more, where is also dnaasm-link module, which is dedicated to join contigs and fill gaps between them by long DNA reads.

If you have any questions, troubles, suggestions etc. please contact with us (we can also help you choose the best parameters of the application depending on the specifics of the input dataset): W.Kusmirek@ii.pw.edu.pl

System Requirement

dnaasm aims for large genomes, although it also works well on small genomes (bacteria and fungi). It runs on 64-bit Linux or Windows system with a minimum of 4G RAM memory. For big genomes, like human, about 250 GB RAM would be required.

Installation

  1. You can download docker image of dnaasm application:
docker pull wkusmirek/dnaasm
docker run --rm -it -v /tmp:/tmp -w /tmp wkusmirek/dnaasm ./dnaasm -assembly
docker run --rm -it -v /tmp:/tmp -w /tmp wkusmirek/dnaasm ./dnaasm -scaffold
  1. Or download the source code and compile on your own. The installation instruction could be found in INSTALL file.

How to use it

example command

dnaasm could be run from command line, for example for E.coli DNA reads:

docker image version

docker run --rm -it -v /tmp:/tmp -w /tmp wkusmirek/dnaasm ./dnaasm -assembly -k 55 -genome_length 4600000 -correct 1 -paired_reads_algorithm 1 -quality_threshold 0 -bfcounter_threshold 2 -single_edge_counter_threshold 5 -paired_reads_pet_threshold_from 3 -paired_reads_pet_threshold_to 5 -i1_1 /tmp/reads_inward.R1.fq -i1_2 /tmp/reads_inward.R2.fq -output_file_name /tmp/out.fa
docker run --rm -it -v /tmp:/tmp -w /tmp wkusmirek/dnaasm ./dnaasm -scaffold -contigs_file_path /tmp/contigs.fa -long_reads_file_path /tmp/simulated_reads.fasta -kmer_size 15 -distance 4000 -step 2 -min_links 5 -max_ratio 0.3 -min_contig_length 500 -output_file_name /tmp/scaffolds.fa

compiled version

./dnaasm -assembly -k 55 -genome_length 4600000 -correct 1 -paired_reads_algorithm 1 -quality_threshold 0 -bfcounter_threshold 2 -single_edge_counter_threshold 5 -paired_reads_pet_threshold_from 3 -paired_reads_pet_threshold_to 5 -i1_1 /tmp/reads_inward.R1.fq -i1_2 /tmp/reads_inward.R2.fq -output_file_name /tmp/out.fa
./dnaasm -scaffold -contigs_file_path /tmp/contigs.fa -long_reads_file_path /tmp/simulated_reads.fasta -kmer_size 15 -distance 4000 -step 2 -min_links 5 -max_ratio 0.3 -min_contig_length 500 -output_file_name /tmp/scaffolds.fa

Options for dnaasm de novo assembler

  -k <int>                                 the de Bruijn graph dimension, it should be odd number smaller than or equal to 64
  -genome_length <int>                     the length of original genome
  -correct <int>                           set to '1' if errors should be corrected in graph, otherwise '0', [1]
  -paired_reads_algorithm <int>            set to '0' if reads are unpaired, '1' if reads are paired (forward-reverse)(-insert_size_mean_inward, -insert_size_std_dev_inward, -pairedReadsThrFrom and -pairedReadsThrTo required only in paired mode), [0]
  -insert_size_mean_inward <float>         the value associated with paired-end tags, required only when '-paired_reads_algorithm' is set to '1', [0.0]
  -insert_size_std_dev_inward <float>      the value associated with paired-end tags, required only when '-paired_reads_algorithm' is set to '1', [0.0]
  -insert_size_mean_outward <float>        the value associated with mate pairs, required only when mate-pair data is available, [0.0]
  -insert_size_std_dev_outward <float>     the value associated with mate pairs, required only when mate-pair data is available, [0.0]
  -quality_threshold <int>                 the quality threshold value(0-93)for reads from FASTQ files, [0]
  -bfcounter_threshold <int>               the threshold of k-mer counter in k-mer occurrence table below which k-mer will not be considered, [0]
  -single_edge_counter_threshold <int>     the threshold of edge counter in single graph below which edge will be deleted from single graph, [0]
  -paired_reads_pet_threshold_from <int>   the threshold (begin value) of edge counter (each paired-end tag adds new edge or increment specified counter) for unitigs graph, required only when '-paired_reads_algorithm' is set to '1', [0]
  -paired_reads_pet_threshold_to <int>     the threshold (end value) of edge counter (each paired-end tag adds new edge or increment specified counter) for unitigs graph, required only when '-paired_reads_algorithm' is set to '1', [0]
  -paired_reads_mp_threshold_from <int>    the threshold (begin value) of edge counter (each mate-pair adds new edge or increment specified counter) for contigs graph, required only when mate-pair data is available, [0]
  -paired_reads_mp_threshold_to <int>      the threshold (end value) of edge counter (each mate-pair adds new edge or increment specified counter) for contigs graph, required only when mate-pair data is available, [0]
  -i1_1 <string>                           reads file in FASTA or FASTQ format in inward orientation
  -i1_2 <string>                           reads file in FASTA or FASTQ format in inward orientation
  -o1_1 <string>                           reads file in FASTA or FASTQ format in outward orientation
  -o1_2 <string>                           reads file in FASTA or FASTQ format in outward orientation
  -bfc_file <string> (optional)            output from BFCounter application, this file is optional, but recommended to assembling large genomes (to save memory)
  -output_file_name <string>               output file name, [out]
</string></string></string></string></string></string></int></int></int></int></int></int></int></float></float></float></float></int></int></int></int>
  -contigs_file_path <file_name>           contigs file in FASTA format
  -long_reads_file_path <file_name>        reads file in FASTA format
  -kmer_size <int>                         length of k-mers extracted from reads and mapped into contigs, [15]
  -distance <int>                          distance between the 5’-end of each extracted k-mer pair, [4000]
  -step <int>                              default step of sliding window during k-mer extraction, [2]
  -min_links <int>                         minimum number of links (k-mer pairs) between two contigs to compute scaffold, [5]
  -min_lpr <int>                           minimum number of links (between two contigs) coming from a single read to compute scaffold, [1]
  -min_reads <int>                         minimum number of reads for which min_lpr threshold must be satisfied to compute scaffold, [1]
  -max_ratio <float>                       maximum link ratio between two best contigs to be paired, [0.3]
  -min_contig_length <int>                 minimum contig length to consider for scaffolding, [500]
  -gapfilling <int>                        enable gap filling on scaffolds, [1]
  -output_file_name <file_name>            output file name, [out]
</file_name></int></int></float></int></int></int></int></int></int></file_name></file_name>

Output files

dnaasm produce output file with resultant DNA sequences in file specified in command line (-output_file_name parameter). Moreover, dnaasm produces file with assembling logs in /tmp/dnaasm/dnaasm_calc_0.log file.

Citation

Wiktor Kuśmirek and Robert Nowak (2018). De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application. BMC Bioinformatics, 2018, 19:273. doi:10.1186/s12859-018-2281-4