Menu

Tree [f02648] master /
 History

HTTPS access


File Date Author Commit
 Makefile 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 README.txt 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 cluster.R 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 combined.exon1.all.tail.fa 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 count_stop_codons.fancy 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 cut_sections 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 delcher.cc 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 delcher.hh 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 delcher.o 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 drabek.py 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 etha.exon1 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 etha.exon2 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 exceptions.hh 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 exon-ends.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 exon-starts.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 exon1-end-mer.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 exon1-start-mer.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 extract-fasta-bytag-rev.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 extract-fasta-bytag.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 extract-long-seqs 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 fasta.cc 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 fasta.hh 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 fasta.o 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 fastalen.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 get-contained-matches.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 get-exon1.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 get-exon2.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 get-uniq-path-seqs.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 group_matches 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 kmer-hash.cc 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 kmer-hash.hh 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 kmer-hash.o 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 kmer-repair 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 kmer-repair.cc 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 kmer-repair.hh 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 kmer-repair.o 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 kmer_correct 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 make-unitig-seq.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 multi-trace 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 multi-trace.cc 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 multi-trace.hh 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 multi-trace.o 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 multi-walk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 multi-walk.cc 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 multi-walk.hh 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 multi-walk.o 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 muscle_fasta_to_consensus 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 n50.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 primer-pair-matches 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 primer-pair-matches.cc 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 primer-pair-matches.hh 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 primer-pair-matches.o 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 promer_coords_to_nucmer_coords 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 ref.ex1-splice.71mer 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 remove_inclusions.composite 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 rev-comp.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 run_exon_1 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 show-coords_to_distance_matrix 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 take_full_exons_only 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 uni-classify.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 unify_blobs_and_results.composite 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 union-regions-tail.awk 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 unitig 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 unitig.cc 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 unitig.hh 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit
 unitig.o 2017-02-17 Elliott Drabek Elliott Drabek [f02648] Initial commit

Read Me

For information about purposes and methods of ETHA, see Dadra et al 2016 "Reconstruction of full-length Plasmodium falciparum var exon 1 sequences reveals severe malaria and pregnancy-associated malaria vars in uncomplicated malaria infections in Malian children".

To run ETHA to reconstruct var exon 1 sequences, you will need a set of Illumina reads and a pre-existing whole genome assembly of the same isolate.  You will also need access to three software dependencies:

* Jellyfish (tested with jellyfish-2.0.0beta6.1) http://www.cbcb.umd.edu/software/jellyfish/
* Glimmer (testedw with glimmer-3.02) https://ccb.jhu.edu/software/glimmer/
* MUMmer (tested with version 3.06) http://mummer.sourceforge.net/

You will need to make sure that the executables for each of these packages are available in you path. Edit the primary driver script "run_exon_1" to assign the PATH variable appropriately to include the correct paths on your system.

Running the pipeline consists of three steps:

1) Running Jellyfish on the Illumina reads to get counts of all observed 71mers. See the Jellyfish documentation for instructions for this step.

2) Setting up the working directory with three inputs files. These should be copied or symlinked to these exact names:
** asm.seq.fa, the whole genome assembly
** 71.mer_counts, the output of step 1
** exon1.all.tail.fa, which lists the tail ends of known exon 1 sequences. A version is included with this code. Augmenting the included version with sequences likely to be similar to those of the target strain may improve sensitivity

3) Running the main driver script:

run_exon_1 $etha $working_directory $lower_kmer_bound $upper_kmer_bound

Here, $etha is the full path of the directory containing the code and this README file, $working_directory is the path of the directory created in step 2, and the kmer bounds are numbers indicating the minimum and maximum numbers of times a 71mer must be seen in the Illumina data to be used. These should be set to reflect the reasonable variation in read depth that characterizes the particular dataset. Note that if you know what value you will be using for the lower bound, you can save storage space by asking Jellyfish to keep only those kmers above that value.

ETHA will run for some hours, putting all of its intermediate and output files in the same working directory. For most purposes, the files you will be most interested in will be these:

* finish/results.deduplicated.fa, the output of ETHA proper, high confidence var sequences
* finish/union.fa, the output of ETHA proper, plus var-like sequences identified in the whole genome assembly which are not accounted for in the ETHA output.

If you run into any difficulty or question that is not addressed here, please email elliott.drabek@gmail.com or jcsilva@som.umaryland.edu
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.