Name | Modified | Size | Downloads / Week |
---|---|---|---|
README.txt | 2020-05-20 | 3.6 kB | |
tree2gd_release_20191030.tgz | 2020-05-20 | 666.2 kB | |
Totals: 2 Items | 669.9 kB | 4 |
============================================================ Program : ./tree2gd_linux64 or ./tree2gd_macos Version : 2.4 Contact : Ji QI [qij@fudan.edu.cn] ============================================================ Gene duplication on different lineages was detected by comparing gene-family trees with a reference species tree. Briefly, for each gene-family tree, the LCA was assigned for each gene clade, determined by taxon groups of the species carrying the genes in the clade. The nodes on the gene-family tree with bootstrap support smaller than 50% were not considered in subsequent analyses. We then examined the species corresponding to all genes in a clade before and two sister clades after a putative duplication. When a gene duplication node involves two genes from a single species, a GD was counted for the lineage represented by the species. When at least one of two clades (paralogs) of a duplication node includes two sister lineages (each represented by a single species) in the species tree, a GD on the LCA of the two lineages was counted. When the node with a candidate WGD includes three or more species, a GD was counted when two requirements were met: (1) the two paralogous clades shared two or more species; (2) the difference in the depths of the two paralogous clades was 1 or zero, where the depth of a paralogous clade was defined as the number of branches in the species tree from the LCA of the gene clade to the root of the species tree. Numbers of duplications were summarized on the species tree by iterating all single gene family trees. ============================================================ Usage : ./tree2gd_linux64 species_tree gene_idmap gene_tree_list out_folder input file 1: a species tree with newick format (ROOTED) input file 2: a map file with two columns to tell which species a gene belong to input file 3: a list file with two columns, each line is for one gene tree (UNROOTED), while the 1st column is a digital ID, the 2nd column is the filename. Please note that the species tree must be rooted, while the gene trees are unrooted. There are multiple output files. The output file named "summary.txt" includes multiple trees. The first tree is the species tree input by users and the number of GD events on each node is displayed. The format of these trees as described below. Given a tree as (((A,B),(C,D)),E), the text format of the tree in te summary.txt is like: node_0 node_1 | node_2 | | gene_A | | gene_B | node_3 | gene_C | gene_D gene_E The unrooted gene trees provided by users are rooted by this program and are listed consecutively after the species tree in the summary.txt. ============================================================ Example: ./tree2gd_linux64 species_tree.txt gene.idmap gene_tree.list output Other parameters: [--species=] minimum number for overlapped species for duplicated clades: 2 [--bootstrap=] minimum bootstrap value: 50 [--subclade_bp=] minimum bp values for sub clades: 0 [--split_tree=] split tree into subtrees [true/false] [--quick_file=] quickparanoid file, for clusters without a corresponding tree [--parser_file=] blast_parser file, if a cluster doesn't have a outgroup OTU, find a new one [--paml=] prepare pmal files for each cluster [true/false] [--omega=] Ka/Ks/Omega file, tabular format [--genome=] list of genomes, which is ignored for identifying isoforms [--isoform=] table of isoforms, in which alternative transcirpts are ignored [--rooted=] output rooted trees [true/false] [--deepvar=] maximum variance of deepth: 1