Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
readme.RHMM_snp_allocate | 2018-04-06 | 4.1 kB | |
RecHMM_v1.01.R | 2014-11-11 | 18.6 kB | |
RecHMM.R | 2014-10-30 | 18.6 kB | |
RHMM_snp_allocate_old.R | 2014-10-24 | 7.1 kB | |
RHMM_snp_allocate.R | 2014-10-24 | 7.5 kB | |
RecHMM.documents.pdf | 2014-04-04 | 101.3 kB | |
Totals: 6 Items | 157.1 kB | 0 |
*** IMPORTANT *** A newer and more powerful version of RecHMM is now available as a module in EToKi package. Please find it in the link: https://github.com/zheminzhou/EToKi ----------- RHMM_snp_allocate.R is a progam for assigning SNPs onto branches in a known phylogeny, using Viterbi algorithm. USAGE: Rscript RHMM_snp_allcate.R <file.newick> <file.SNP> <proportion of sites for the phylogeny> <file.REC> <rec.diversity> <file.newick> The known phylogeny in NEWICK format <file.SNP> A list of all SNPs in the analysed genomes <proportion of sites for the phylogeny> Most of bioinformaticians use only polymophism sites to build the phylogeny. This will megnified the branch lengthes by ignoring all non-polymorphic sites. For example, if you used 10,000 polymorphic sites in a core genome of 1,000,000 bps to build the tree, this parameter will be 10000/1000000 = 0.01 Following parameters are not required: <file.REC> Recombinations can alter the local branch lengths. You can take this into account by adding a file containing recombinant regions. <rec.diversity> The average nucleotide diversity in recombinant regions. You can get this value as 'nu' in RecHMM output. ------------------------------ INPUTS: <file.newick> One simplest example: ((1:0.01,2:0.01):0.01,3:0.02); <file.SNP> Format: #Site <genome 1> <genome 2> ... <site coordinate> <base> <base> ... ... Example: #Site 1 2 3 3 A T T 10 A A G 103 C G G ... Note: The names of genomes in the first line have to be the same as the tips in the file <file.newick>. All columns are separated by 'tab' ('\t'). <file.REC> Format: <start_site> <end_site> <not used> <not used> <branch name> Example: 2577 2609 0.688236769030142 33 Br_1 7971 7990 0.916113340873472 20 Br_1 16493 16516 0.872056771815245 24 Br_2 19101 19152 0.997786721510221 52 Br_2 21976 22059 0.975300372805246 84 Br_2 23790 23987 0.964951921855543 87 Br_2 29670 29726 0.997294595078072 54 Br_4 30157 30250 0.999985008675643 94 Br_4 This format is compatible with the outputs of RecHMM. -------------------------- OUTPUTS: two files are generated: <file.newick>.annote.nex The same tree as the input, with branches designated with a serial number brID="Br_xxxx". This file can be opened in FigTree. <file.newick>.events The assignments of SNPs onto branches. Format for <file.newick>.events: <Site of SNP> <brID> <bipartition information> <nucleotide change> Note: To change this into the inputs for RecHMM, simply extract the first two columns. Homoplasies are shown in multiple rows with the same site. Example: 48 Br_108 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAABBBBBBBBB T->A 48 Br_110 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB T->G 62 Br_15 BBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB G->A 62 Br_91 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAABAAAAAAA G->A 62 Br_92 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAABBBAAAAAA A->G 62 Br_108 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAABBBBBBBBB G->C 62 Br_109 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAABAAAAAAAAABBBBBBBBBBBBABABBBBBB A->G 62 Br_122 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB G->A 62 Br_127 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABABABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB A->G 62 Br_128 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB G->A 62 Br_52 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB G->A 66 Br_15 BBBBBBBBBBBBBBBBBBBBBBBABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB G->A 66 Br_91 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAABAAAAAAA G->A 66 Br_92 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAABBBAAAAAA A->G 66 Br_94 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAABBBBBBBBBBB G->T