Home
Name Modified Size InfoDownloads / Week
README.txt 2020-05-20 3.6 kB
tree2gd_release_20191030.tgz 2020-05-20 666.2 kB
Totals: 2 Items   669.9 kB 0
============================================================
Program : ./tree2gd_linux64 or ./tree2gd_macos
Version : 2.4
Contact : Ji QI [qij@fudan.edu.cn]

============================================================
Gene duplication on different lineages was detected by comparing gene-family trees with 
a reference species tree. Briefly, for each gene-family tree, the LCA was assigned for 
each gene clade, determined by taxon groups of the species carrying the genes in the 
clade. The nodes on the gene-family tree with bootstrap support smaller than 50% were 
not considered in subsequent analyses. We then examined the species corresponding to 
all genes in a clade before and two sister clades after a putative duplication. When 
a gene duplication node involves two genes from a single species, a GD was counted for 
the lineage represented by the species. When at least one of two clades (paralogs) of 
a duplication node includes two sister lineages (each represented by a single species) 
in the species tree, a GD on the LCA of the two lineages was counted. When the node 
with a candidate WGD includes three or more species, a GD was counted when two 
requirements were met: 
(1) the two paralogous clades shared two or more species; 
(2) the difference in the depths of the two paralogous clades was 1 or zero, where the 
depth of a paralogous clade was defined as the number of branches in the species tree 
from the LCA of the gene clade to the root of the species tree. Numbers of duplications 
were summarized on the species tree by iterating all single gene family trees. 

============================================================
Usage : 
./tree2gd_linux64 species_tree gene_idmap gene_tree_list out_folder

input file 1: a species tree with newick format (ROOTED)
input file 2: a map file with two columns to tell which species a gene belong to
input file 3: a list file with two columns, each line is for one gene tree (UNROOTED),
              while the 1st column is a digital ID, the 2nd column is the filename.

Please note that the species tree must be rooted, while the gene trees are unrooted.
	
There are multiple output files.
The output file named "summary.txt" includes multiple trees. The first tree is the
species tree input by users and the number of GD events on each node is displayed.

The format of these trees as described below. Given a tree as (((A,B),(C,D)),E),
the text format of the tree in te summary.txt is like:

node_0
   node_1
   |  node_2
   |  |  gene_A
   |  |  gene_B
   |  node_3
   |     gene_C
   |     gene_D
   gene_E

The unrooted gene trees provided by users are rooted by this program and are listed
consecutively after the species tree in the summary.txt.

============================================================
Example:
./tree2gd_linux64 species_tree.txt gene.idmap gene_tree.list output

Other parameters:
[--species=] minimum number for overlapped species for duplicated clades: 2
[--bootstrap=] minimum bootstrap value: 50
[--subclade_bp=] minimum bp values for sub clades: 0
[--split_tree=] split tree into subtrees [true/false]
[--quick_file=] quickparanoid file, for clusters without a corresponding tree
[--parser_file=] blast_parser file, if a cluster doesn't have a outgroup OTU, find a new one
[--paml=] prepare pmal files for each cluster [true/false]
[--omega=] Ka/Ks/Omega file, tabular format
[--genome=] list of genomes, which is ignored for identifying isoforms
[--isoform=] table of isoforms, in which alternative transcirpts are ignored
[--rooted=] output rooted trees [true/false]
[--deepvar=] maximum variance of deepth: 1

Source: README.txt, updated 2020-05-20