Home
Name Modified Size InfoDownloads / Week
Download history.txt 2023-11-14 507 Bytes
Building Phylogenetic Trees From Genome Sequences.pdf 2023-11-14 536.2 kB
README.txt 2023-11-13 14.4 kB
kSNP4.1 User Guide.pdf 2023-11-13 2.1 MB
kSNP4.1 Mac package.zip 2023-11-13 1.0 GB
kSNP4.1 Linux package.zip 2023-11-13 545.6 MB
kSNP4.1-source-code.zip 2023-07-28 107.9 kB
Examples.zip 2022-10-27 165.8 MB
kSNPdist.zip 2017-03-23 2.6 MB
Totals: 9 Items   1.7 GB 77
			Warning!!  You are strongly encouraged to install one of the
		packages of executables and not to try to install the source code. 
		The only reason to download the source code is if you need to modify
		that code for your own purposes. If you work from the source code no
		support will be provided.  You are entirely on your own. Please
		direct bug reports to barryghall@gmail.com.


			Compiled executables for Mac and Linux can be downloaded and
		should run as is with no additional installations, after making sure
		everything has executable permissions. A directory with the source
		code is also available, although to run from the source code you'll
		need to install the other software kSNP requires and put it in your
		path. A detailed User Guide is included.  Two example inputs for
		testing are also included, with explanation and directions for how to
		run them in the User Guide.

			Please cite:

			 Hall, B. G. and J. Nisbet. 2023. Building Phylogenetic Trees from 
		Genome Sequences with kSNP4.  Mol. Biol. Evol. 40 https://doi.org/10.1093/molbev/msad235
		
			Gardner, S.N., T. Slezak, and B.G. Hall.  2015.  kSNP3.0: SNP
		detection and phylogenetic analysis of genomes without genome
		alignment or reference genomes. Bioinformatics 31: 2877-2878 doi:
		10.1093/bioinformatics/btv271.

			Gardner, S.N. and Hall, B.G. 2013. When whole-genome alignments
		just won't work: kSNP v2 software for alignment-free SNP discovery
		and phylogenetics of hundreds of microbial genomes. PLoS ONE,
		8(12):e81760.doi:10.1371/journal.pone.0081760

			Gardner, S.N. and Slezak, T.R.  2010. Scalable SNP analyses of
		100+ bacterial or viral genomes. Journal of Forensic Research, 1:107.
		*********************************************************************
		*********************************************************************
		July 28 2023 Version 4.1 released. Major upgrade
		Version 4.1 has been tested on Mac OS 10.15.7 (Catalina) and macOS 13.4 
		(Ventura) and on Ubuntu Linux 22.04. On average kSNP4.1 is 30% faster 
		than kSNP4, and it uses memory about twice as efficiently as does kSNP4.
		
		
		October 26, 2022  Version 4.0 released Major upgrade
		Version 4.0 released. Depending on the data set, version 4 is up to 3 times 
		faster than version 3.1.2. Most changes will be transparent to the user.
		Version 4 has been tested on Mac OS 10.15.7 (Catalina) and on the following
		Linux OS: Ubuntu 16.04, Ubuntu 22.04, Fedora 36 and CentOS Stream 9.
		kSNP4.0 requires no programming skills or knowledge. The packages include a
		revised kSNP4 User Guide, a guide to downloading genome sequences from NCBI, a
		guide to troubleshooting kSNP4, and a BSD Opensource License.
	
		November 9, 2019.  Version 3.1.2 released.  Minor upgrade
		Two utilities added: check_genbank_from_NCBI  and  fix_old_fasta_headers.
		Both utilities are for troubleshooting kSNP3.1 and later.  Both are discussed
		inf the User Guide for version 3.1.2
		
		Sept. 20, 2017 Version 3.1 released. Major upgrade.   Version 3.1
		fixes the problems with SNP annotation that arose when NCBI discontinued
		use of GI numbers. Please read carefully the Preface (page 3) and the
		File of annotated genomes section (pages 9-10) in the version 3.1 User
		Guide. Thanks to Tom Slezak for revsing the get_genbank_file3 script and
		to Tod Stuber (USDA) for testing version 3.1 even though he doesn't need
		the annotation feature. All users are encouraged to upgrade to version
		3.1.
		
		Known issues: Redhat Linux: annotation function requires Redhat
		version 7 or above.

		July 15, 2016 parse_assembly_summary updated to accomodate NCBI's
		modified forrmat of the assembly_summary.txt file for download
		genomes by FTP.
		
		July 12, 2019 Bugs were found  in NodeChiSquare2Tree3, which is
		now replaced by NodeChiSquare2Tree31.  Aside from working properly,
		kSNP 3.1 NodeChiSquare2Tree31 differs only in that the default
		tree_type is parsimony.  The documentation accompanying kSNP3.1 now
		reflects that change. Users need not users need not reinstall kSNP3.1
		to use NodeChiSquare2Tree31.  Simply download the separate
		NodeChiSquare2Tree31 file, put it into the kSNP3 folder, and discard
		the old NodeChiSquare2Tree3 file.  Linux users should change the file
		name NodeChiSquare2Tree31-linux to NodeChiSquare2Tree31.


			June 17, 2016 kSNP3.021 released. All-numeric file names are now
		allowed. Thanks to Egon Ozer of Northwestern University Feinberg
		School of Medicine for fixing the bugs that led to prohibiting
		all-numeric file names.

			May 1, 2016 kSNP3.02 released.  It was discovered that on some
		systems incorrectly naming the input genome sequence files can have
		disastrous results that can lead to incorrect SNP counts and
		incorrect SNP annotations without the run failing.  Thanks to Egon
		Ozer of Northwestern University Feinberg School of Medicine  for
		discovering this important bug.  kSNP3.02 now checks the input file
		for incorrect names and terminates the kSNP3 run when it finds them.
		Please see page 7 of the updated kSNP3.02 documentation for a
		description of the naming rules and what to do when illegal names are
		detected.

			February 10, 2016 kSNP3.01 released.  As the result of changes at
		NCBI's FTP site for genome sequences the utilities
		FetchGinishedGenomes and FetchGenomeAssemblies that were included in
		kSNP3.0 no longer work. Thosde utilities have been replaced by
		parse_assembly_summary and FTPgenomes .

			February 5, 2015 kSNP3 released. v3 has different command line
		options, and several major changes that are summarized below.

			****This README file is not a substitute for the kNSP3 User
		Guide.  It is important to read that guide before using kSNP3. The
		User Guide describes several new kSNP utilities that facilitate
		downloading genome sequences, creation of the Kchooser input file,
		etc.  It also includes a set of hints intended to simplify life for
		kSNP3 users. ****

			kSNP3 MAJOR CHANGES from kSNP version 2: 1. Each genome must be
		provided in a separate fasta file which can contain multiple reads or
		contigs. This differs from kSNP2 where all genomes were in a single
		fasta file, which required merging reads and contigs. It also avoids
		creating massively large and unweildy fasta files. So you don't need
		to run merge_fasta_(reads|contigs) anymore before running kSNP3.

			2. The input file in the -in option must contain the full path
		location of each genome and the genome name, one line per genome, tab
		delimited between full path to genome fasta file in column 1 and
		genome name in column 2. This format allows
		multi-read,multi-chromosome and plasmid, and multi-contig genomes,
		each genome in separate fasta. This allows annotation of sequences
		composed of multiple chromosomes, contigs, and plasmids, each of
		which has a gi number. The user can edit the genome names by editing
		this file instead of editing the fasta files. The SNPs_all file
		contains an extra column with the fasta defline of the contig, and
		positional information is given relative to that contig.

			3. Core and majority trees are calculated using parsimony instead
		of maximum likelihood, since simulations indicated that parsimony SNP
		trees are more accurate (Hall, 2014, submitted). If you still want to
		use ML for core and majority, go into the kSNP3 script and uncomment
		the lines where indicated.

			4. Calculation of ML, core, and majority trees are now optional.
		The default is to only calculate a parsimony tree from the full SNP
		matrix.

			5. Instead of using only 1 best parsimony tree, it now computes a
		consensus parsimony tree from all the trees that tie for the most
		parsimonious of the trees created by parsimonator, using "consense"
		from PHYLIP modified to allow sequence names up to 100 characters.

			6. There is now an option to add genomes to an existing SNP run
		instead of doing SNP discovery. It will search for the SNPs already
		found in a previous kSNP3 run (specified with the -SNPs_all option)
		in the new genomes listed in the -in file.

			7. The -u and -c options from kSNP2 are obsolete. Instead, the
		code automatically determines which genomes are high coverage raw
		reads versus those that are either assembled or low coverage, and
		automatically picks the minimum kmer frequency for consideration as a
		SNP as a proxy for coverage. It calculates this minimum kmer
		frequency from the kmer counts for each genome as the average of the
		median and mean kmer count for that genome. This is a heuristic that
		allows a flexible kmer count threshold for each genome that depends
		on the coverage of any given unassembled genome, and always results
		in a threshold of 1 for assembled genomes. This is helpful for
		comparing a mix of high and low coverage genomes, such as when some
		genomes in the kSNP3 run are low coverage reads extracted from a
		metagenome for the species of interest.

#####################
####################################################################################
			5/20/2014 Minor errors were corrected in the User Guide. Since
		there were no changes to the code, only the User Guide that can be
		downloaded separately from the Mac and Linux executables was updated
		on sourceforge.

			3/31/2014 Minor change in annotation code so that it will
		recognize gi numbers when they are in a format gi_448814763_....  
		Previously, it would only recognize the gi number preceded by a "_"
		if it was followed by a space, not any non-digit.

			3/30/2014 (still V2.1.2) Recompiled the script NodeChiSquare2tree
		in Linux and Mac executables, since the previous compiled versions
		were not finding the required perl modules. This is an extra script
		the user can call that is not called in the main kSNP code. Fixed
		minor errors in User Guide.

			3/23/2014   V2.1.2 Modified kSNP wrapper script so that now you
		MUST indicate the path to  the kSNP executables in that script. This
		means that now you do not need to add kSNP directory to your path
		environment variable, but you do need to edit this line in the kSNP
		file to point to the directory with all the kSNP executables: set
		kSNP=/usr/local/kSNP

			2/20/2014 V2.1.1 Fixed executable version of label_tree_nodes,
		since it was failing to find the a perl module, and as a result the
		files containing a tree with labeled nodes was empty.

			Table of Contents on User Guide was incomplete, and this has been
		corrected.

			Permissions are automatically set to 755 for the kSNP file in the
		Mac and Linux versions. Before the user needed to make the executable
		after downloading.

			The above fixes only affect the Linux and Mac executables so I
		didn't do a new upload of the source code version.

			1/31/2014   V2.1

			Fixed annotation bug, since it was failing to annotate many SNPs.
		The bug was in the genbank file downloader, so with v2.1 it always
		downloads the annotations if they are there. Previously it skipped
		the annotations for some gi#'s and so SNPs were incompletely
		annotated.

			Added a new script NodeChiSquare2tree to assign SNPs to nodes
		based on ChiSquare, allowing for imperfect but significant assocation
		of SNPs with tree nodes. In some cases, this allows more SNPs to be
		mapped to nodes even if there is not a perfect correspondence, e.g.
		if the allele is missing in one of the leaf genomes down that branch
		or present outside the branch. This should help assign more SNPs to
		nodes when draft genomes are included. Look at the User Guide for
		more information.

			FastTreeMP now prints support values at the nodes, shown in the
		tree.ML.tre, tree.parsimony.tre,tree.core.tre, and
		tree.majority0.5.tre files. The root may be different than shown in
		the other trees with SNP allele counts since kSNP reroots the trees
		after the support values have been replaced by node numbers.

			Rewrote label_tree_nodes to use bioperl functions instead of text
		parsing, for easier labeling when support values are present.

			Files tree_nodeLabel.*.tre are kept instead of being moved to
		TempFilesToDelete, so the user can run NodeChiSquare2tree.

			Added the -c [minimum kmer count] argument to kSNP. This
		specifies the minimum number of times a kmer must occur in an
		unassembled raw read genome for it to be considered as a SNP locus in
		that genome. It defaults to 10. This argument enables the user to
		control for sequencing coverage. Note that this count is not exactly
		the same as coverage depth, since it will be lower due to bases that
		fall near the ends of reads, so do not contain the entire kmer.



			10/9/13 Fixed bug that caused kSNP to eliminate very long genomes
		(>~2GB, e.g. unassembled genomes) and any subsequently listed genomes
		from the analyses.

			9/7/13 Added kChooser to identify the optimal value of k prior to
		running kSNP. Made it optional to create a .vcf file, since this
		script could require alot of RAM.

			8/27/13 Added extract_nth_locus script to pull out the nth locus
		from the core_SNPs or SNPs_in_majority# file, handy if you're looking
		at position n in the core SNPs matrix or SNPs_in_majority matrix and
		you want to know what locus it is.

			Made kSNP default to not calculate a NJ tree, and added the
		command line option -j if the NJ tree is desired. Need to write
		faster code to calculate distance matrix from SNPs matrix. Current
		code does slow loops that take as long as #SNP loci x # pairwise
		combinations of genomes.


			6/27/13 Modified select_node_annotations so it will work on a
		Mac.


			6/6/13 Added trees with no node labels to the results directory.

			Modified the SNP_annotations file so that there are fixed columns
		for gene, product, notes, etc. and improved memory efficiency of
		annotating the SNPs that should help for data sets with over a
		million SNPs. Added select_node_annotations script so a user can pull
		out the annotated SNP loci which map to a particular user-specified
		node of a tree Fixed miscount in the Annotation_summary file


Source: README.txt, updated 2023-11-13