| File | Date | Author | Commit | 
|---|---|---|---|
| dockers | unknown | ||
| docs | unknown | ||
| impl | 2018-10-24 |  Wiktor Kuśmirek | [36ea4a] bugfix in package building | 
| katome | unknown | ||
| sandbox | 2021-04-27 |  Marcin Konefal | [1e2981] Final fixes in thesis and for thesis. | 
| tags | unknown | ||
| .hgignore | 2021-04-27 |  Marcin Konefal | [1e2981] Final fixes in thesis and for thesis. | 
| CHANGES | 2018-10-23 |  Wiktor Kuśmirek | [ad8753] finished verion of dnaasm tool with dnaasm-link... | 
| INSTALL | unknown | ||
| README.md | 2018-10-23 |  Wiktor Kuśmirek | [ad8753] finished verion of dnaasm tool with dnaasm-link... | 
dnaasm is an application for analysis NGS data. Firstly, dnaasm contains de novo addembler, which could be used to assemble short DNA reads of highly repetitive genomes. Secondly, mentioned tool could be used to reconstruct long tandem repeats, which could not be restored by another DNA assemblers. This use case could improve investigated organism' genome draft.
What is more, where is also dnaasm-link module, which is dedicated to join contigs and fill gaps between them by long DNA reads.
If you have any questions, troubles, suggestions etc. please contact with us (we can also help you choose the best parameters of the application depending on the specifics of the input dataset): W.Kusmirek@ii.pw.edu.pl
dnaasm aims for large genomes, although it also works well on small genomes (bacteria and fungi). It runs on 64-bit Linux or Windows system with a minimum of 4G RAM memory. For big genomes, like human, about 250 GB RAM would be required.
docker pull wkusmirek/dnaasm docker run --rm -it -v /tmp:/tmp -w /tmp wkusmirek/dnaasm ./dnaasm -assembly docker run --rm -it -v /tmp:/tmp -w /tmp wkusmirek/dnaasm ./dnaasm -scaffold
dnaasm could be run from command line, for example for E.coli DNA reads:
docker run --rm -it -v /tmp:/tmp -w /tmp wkusmirek/dnaasm ./dnaasm -assembly -k 55 -genome_length 4600000 -correct 1 -paired_reads_algorithm 1 -quality_threshold 0 -bfcounter_threshold 2 -single_edge_counter_threshold 5 -paired_reads_pet_threshold_from 3 -paired_reads_pet_threshold_to 5 -i1_1 /tmp/reads_inward.R1.fq -i1_2 /tmp/reads_inward.R2.fq -output_file_name /tmp/out.fa docker run --rm -it -v /tmp:/tmp -w /tmp wkusmirek/dnaasm ./dnaasm -scaffold -contigs_file_path /tmp/contigs.fa -long_reads_file_path /tmp/simulated_reads.fasta -kmer_size 15 -distance 4000 -step 2 -min_links 5 -max_ratio 0.3 -min_contig_length 500 -output_file_name /tmp/scaffolds.fa
./dnaasm -assembly -k 55 -genome_length 4600000 -correct 1 -paired_reads_algorithm 1 -quality_threshold 0 -bfcounter_threshold 2 -single_edge_counter_threshold 5 -paired_reads_pet_threshold_from 3 -paired_reads_pet_threshold_to 5 -i1_1 /tmp/reads_inward.R1.fq -i1_2 /tmp/reads_inward.R2.fq -output_file_name /tmp/out.fa ./dnaasm -scaffold -contigs_file_path /tmp/contigs.fa -long_reads_file_path /tmp/simulated_reads.fasta -kmer_size 15 -distance 4000 -step 2 -min_links 5 -max_ratio 0.3 -min_contig_length 500 -output_file_name /tmp/scaffolds.fa
-k <int> the de Bruijn graph dimension, it should be odd number smaller than or equal to 64 -genome_length <int> the length of original genome -correct <int> set to '1' if errors should be corrected in graph, otherwise '0', [1] -paired_reads_algorithm <int> set to '0' if reads are unpaired, '1' if reads are paired (forward-reverse)(-insert_size_mean_inward, -insert_size_std_dev_inward, -pairedReadsThrFrom and -pairedReadsThrTo required only in paired mode), [0] -insert_size_mean_inward <float> the value associated with paired-end tags, required only when '-paired_reads_algorithm' is set to '1', [0.0] -insert_size_std_dev_inward <float> the value associated with paired-end tags, required only when '-paired_reads_algorithm' is set to '1', [0.0] -insert_size_mean_outward <float> the value associated with mate pairs, required only when mate-pair data is available, [0.0] -insert_size_std_dev_outward <float> the value associated with mate pairs, required only when mate-pair data is available, [0.0] -quality_threshold <int> the quality threshold value(0-93)for reads from FASTQ files, [0] -bfcounter_threshold <int> the threshold of k-mer counter in k-mer occurrence table below which k-mer will not be considered, [0] -single_edge_counter_threshold <int> the threshold of edge counter in single graph below which edge will be deleted from single graph, [0] -paired_reads_pet_threshold_from <int> the threshold (begin value) of edge counter (each paired-end tag adds new edge or increment specified counter) for unitigs graph, required only when '-paired_reads_algorithm' is set to '1', [0] -paired_reads_pet_threshold_to <int> the threshold (end value) of edge counter (each paired-end tag adds new edge or increment specified counter) for unitigs graph, required only when '-paired_reads_algorithm' is set to '1', [0] -paired_reads_mp_threshold_from <int> the threshold (begin value) of edge counter (each mate-pair adds new edge or increment specified counter) for contigs graph, required only when mate-pair data is available, [0] -paired_reads_mp_threshold_to <int> the threshold (end value) of edge counter (each mate-pair adds new edge or increment specified counter) for contigs graph, required only when mate-pair data is available, [0] -i1_1 <string> reads file in FASTA or FASTQ format in inward orientation -i1_2 <string> reads file in FASTA or FASTQ format in inward orientation -o1_1 <string> reads file in FASTA or FASTQ format in outward orientation -o1_2 <string> reads file in FASTA or FASTQ format in outward orientation -bfc_file <string> (optional) output from BFCounter application, this file is optional, but recommended to assembling large genomes (to save memory) -output_file_name <string> output file name, [out] </string></string></string></string></string></string></int></int></int></int></int></int></int></float></float></float></float></int></int></int></int>
-contigs_file_path <file_name> contigs file in FASTA format -long_reads_file_path <file_name> reads file in FASTA format -kmer_size <int> length of k-mers extracted from reads and mapped into contigs, [15] -distance <int> distance between the 5’-end of each extracted k-mer pair, [4000] -step <int> default step of sliding window during k-mer extraction, [2] -min_links <int> minimum number of links (k-mer pairs) between two contigs to compute scaffold, [5] -min_lpr <int> minimum number of links (between two contigs) coming from a single read to compute scaffold, [1] -min_reads <int> minimum number of reads for which min_lpr threshold must be satisfied to compute scaffold, [1] -max_ratio <float> maximum link ratio between two best contigs to be paired, [0.3] -min_contig_length <int> minimum contig length to consider for scaffolding, [500] -gapfilling <int> enable gap filling on scaffolds, [1] -output_file_name <file_name> output file name, [out] </file_name></int></int></float></int></int></int></int></int></int></file_name></file_name>
dnaasm produce output file with resultant DNA sequences in file specified in command line (-output_file_name parameter). Moreover, dnaasm produces file with assembling logs in /tmp/dnaasm/dnaasm_calc_0.log file.
Wiktor Kuśmirek and Robert Nowak (2018). De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application. BMC Bioinformatics, 2018, 19:273. doi:10.1186/s12859-018-2281-4