Home / RecHMM
Name Modified Size InfoDownloads / Week
Parent folder
readme.RHMM_snp_allocate 2018-04-06 4.1 kB
RecHMM_v1.01.R 2014-11-11 18.6 kB
RecHMM.R 2014-10-30 18.6 kB
RHMM_snp_allocate_old.R 2014-10-24 7.1 kB
RHMM_snp_allocate.R 2014-10-24 7.5 kB
RecHMM.documents.pdf 2014-04-04 101.3 kB
Totals: 6 Items   157.1 kB 0
*** IMPORTANT ***

A newer and more powerful version of RecHMM is now available as a module in EToKi package. 
Please find it in the link: https://github.com/zheminzhou/EToKi

-----------

RHMM_snp_allocate.R is a progam for assigning SNPs onto branches in a known phylogeny, using Viterbi algorithm. 

USAGE:
	Rscript RHMM_snp_allcate.R <file.newick> <file.SNP> <proportion of sites for the phylogeny> <file.REC> <rec.diversity>
	
	<file.newick>								The known phylogeny in NEWICK format
	<file.SNP>									A list of all SNPs in the analysed genomes
	<proportion of sites for the phylogeny>		Most of bioinformaticians use only polymophism sites to build the phylogeny. This will megnified the branch lengthes by ignoring all non-polymorphic sites. 
												For example, if you used 10,000 polymorphic sites in a core genome of 1,000,000 bps to build the tree, this parameter will be 10000/1000000 = 0.01

Following parameters are not required:
	<file.REC>									Recombinations can alter the local branch lengths. You can take this into account by adding a file containing recombinant regions.
	<rec.diversity>								The average nucleotide diversity in recombinant regions. You can get this value as 'nu' in RecHMM output. 

------------------------------
INPUTS:
	<file.newick>
One simplest example:
((1:0.01,2:0.01):0.01,3:0.02);


	<file.SNP>
Format:
#Site	<genome 1>	<genome 2>	...
<site coordinate>	<base>	<base>	...
...

Example:
#Site	1	2	3
3	A	T	T
10	A	A	G
103	C	G	G
...

Note: The names of genomes in the first line have to be the same as the tips in the file <file.newick>. All columns are separated by 'tab' ('\t'). 


	<file.REC>
Format:
<start_site>	<end_site>	<not used>	<not used>	<branch name>

Example:
2577 2609 0.688236769030142 33 Br_1
7971 7990 0.916113340873472 20 Br_1
16493 16516 0.872056771815245 24 Br_2
19101 19152 0.997786721510221 52 Br_2
21976 22059 0.975300372805246 84 Br_2
23790 23987 0.964951921855543 87 Br_2
29670 29726 0.997294595078072 54 Br_4
30157 30250 0.999985008675643 94 Br_4

This format is compatible with the outputs of RecHMM. 

--------------------------
OUTPUTS:
	two files are generated:
	<file.newick>.annote.nex			The same tree as the input, with branches designated with a serial number brID="Br_xxxx". This file can be opened in FigTree. 
	<file.newick>.events				The assignments of SNPs onto branches. 

Format for <file.newick>.events:
<Site of SNP> <brID> <bipartition information> <nucleotide change>

Note:
	To change this into the inputs for RecHMM, simply extract the first two columns. Homoplasies are shown in multiple rows with the same site. 

Example:
48 Br_108 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAABBBBBBBBB T->A
48 Br_110 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB T->G
62 Br_15 BBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB G->A
62 Br_91 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAABAAAAAAA G->A
62 Br_92 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAABBBAAAAAA A->G
62 Br_108 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAABBBBBBBBB G->C
62 Br_109 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAABAAAAAAAAABBBBBBBBBBBBABABBBBBB A->G
62 Br_122 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB G->A
62 Br_127 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABABABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB A->G
62 Br_128 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB G->A
62 Br_52 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB G->A
66 Br_15 BBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB G->A
66 Br_91 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAABAAAAAAA G->A
66 Br_92 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAABBBAAAAAA A->G
66 Br_94 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAABBBBBBBBBBB G->T

Source: readme.RHMM_snp_allocate, updated 2018-04-06