Download Latest Version PhyloMCL_V2.0.tar.gz (1.3 MB)
Email in envelope

Get an email when there's a new version of PhyloMCL

Home / Materials
Name Modified Size InfoDownloads / Week
Parent folder
time_memory.tar.gz 2020-01-06 7.2 MB
orthobench_data.tar.gz 2020-01-06 787.0 MB
QfO_data.tar.gz 2020-01-06 262.8 MB
25_plants.tar.gz 2020-01-06 216.8 MB
11_angiosperms.tar.gz 2020-01-06 1.9 GB
README 2020-01-05 5.7 kB
Totals: 6 Items   3.2 GB 0
The files list below are for testing the methods on the five datasets. Here are the descriptions for file contents and formats.
--------------------------------------------------------------------------------
1. OrthoBench data set
Folder: orthobench_data
[fasta] folder: raw protein sequences (204,110 in total) of 12 bilaterian genomes;
[input/12bilaterians.blastp] file: all-against-all blastp alignment of 12 bilaterians, as an input file for PhyloMCL and OrthoFinder;
[input/12bilaterians.nwk] file: a species tree of 12bilaterians (taxonomic ID), as a input file for PhyloMCL.
[input/gene.idmap]: a map of gene ID to species name (taxonomic ID), as a input file for PhyloMCL.
[input/gene.length] file: and a map of gene ID to protein length, as a input file for PhyloMCL.
[output/ogs.phylomcl] file: inferred ortholog groups by PhyloMCL;
[output/ogs.orthofinder] file: inferred ortholog groups by OrthoFinder2;
[output/ogs.orthomcl] file: inferred ortholog groups by OrthoMCL;
[output/ogs.treefam] file: inferred ortholog groups by TreeFam;
[output/ogs.eggnog] file: inferred ortholog groups by eggNOG;
[output/ogs.oma] file: inferred ortholog groups by OMA;
[evaluation/orthobench.txt] file: a list of 70 ortholog groups in OrthoBench for evaluating performance of methods;
[evaluation/metazoa] folder: raw proteins and trees of 70 ortholog groups downloaded from OrthoBench;

--------------------------------------------------------------------------------
2. 11 angiosperms data set (for evaluation using paralogs from whole genome duplication)
Folder: 11_angiosperms
[fasta] folder: raw protein sequences (XXX in total) of 11 angiosperm genomes;
[input/4_fabaceae.blastp] file: all-against-all blastp alignment of 4 fabaceae plants, as an input file for PhyloMCL and OrthoFinder;
[input/7_edducots.blastp] file: all-against-all blastp alignment of 7 eudicot plants;
[input/11_angiosperms.blastp] file: all-against-all blastp alignment of 11 angiosperm plants;
[input/4_fabaceae.nwk, 7_edducots.nwk and 11_angiosperms.nwk], [input/gene.idmap] and [input/gene.length] files: species trees, a map of gene ID to species ID and a map of gene ID to protein length, as input files necessary for running PhyloMCL.
[output/4_fabaceae.phylomcl, 7_edducots.phylomcl and 11_angiosperms.phylomcl] files: inferred ortholog groups by PhyloMCL on three datasets;
[output/4_fabaceae.orthofinder, 7_edducots.orthofinder and 11_angiosperms.orthofinder] files: inferred ortholog groups by OrthoFinder2 on three datasets;
[evaluation/Gmax_paralogs.txt] file: a list of paralog pairs yielded from polyploidization events of Glycine max (12,519 pairs) or the ancestor of Fabaceae (9,194 pairs) or eudicots (3,385 gene pairs), or duplicated before divergence of angiosperms (2,206 pairs).
[evaluation/gene_trees.tgz] file: a compressed file including 14,547 gene trees of 13 plants.

--------------------------------------------------------------------------------
3. 25 plant data set (for inferring OGs of leucine-rich repeat receptor-like kinases)
Folder: 25_plants
[fasta] folder: raw protein sequences (XXX in total) of 25 plant genomes;
[input/25_plants.blastp] file: all-against-all blastp alignment of 25 plants, as an input file for PhyloMCL, too large to be included;
[input/25_plants.nwk, [input/gene.idmap] and [input/gene.length] files: species trees, a map of gene ID to species ID and a map of gene ID to protein length, as input files necessary for running PhyloMCL.
[output/25_plants.phylomcl] file: inferred ortholog groups by PhyloMCL on the dataset;
[evaluation/LRR-RLK.nwk] file: a phylogenetic tree of 1,082 LRR-RLK genes obtained from previous studies. 
[evaluation/5_LRR-RLK_OGs.txt] file: a list of XXX LRR-RLK genes of five gene families: BAM/CLV1, BRI1/BRL, EMS, ER/ERL, and SERK.

--------------------------------------------------------------------------------
4. QfO data set
Folder: QfO_data
[fasta] folder: raw protein sequences (754,149 in total) of 66 eukaryote genomes;
[input/66_species.blastp] file: all-against-all protein alignment of 66 eukaryotes using diamond, too large to be included;
[input/66_species.nwk, [input/gene.idmap] and [input/gene.length] files: species trees, a map of gene ID to species ID and a map of gene ID to protein length, as input files necessary for running PhyloMCL.
[output/66_species.phylomcl] file: inferred ortholog groups by PhyloMCL on the dataset;
[output/66_species.ortho_pairs] file: inferred ortholog pairs by PhyloMCL on the dataset;

--------------------------------------------------------------------------------
5. Running time and memory usage test.
Folder: time_memory
Raw protein sequences of 4, 7, 11, 18 and 25 plant genomes are provided above (see section 2 and 3);
[input/4_fabaceae.blastp] file: all-against-all blastp alignment of 4 fabaceae plants (included in section 2);
[input/7_edducots.blastp] file: all-against-all blastp alignment of 7 eudicot plants (included in section 2);
[input/11_angiosperms.blastp] file: all-against-all blastp alignment of 11 angiosperm plants (included in section 2);
[input/18_plants.blastp] file: all-against-all blastp alignment of 18 plants, too large to be included;
[input/25_angiosperms.blastp] file: all-against-all blastp alignment of 25 plants, too large to be included;
[input/4_fabaceae.nwk, 7_edducots.nwk, 11_angiosperms.nwk, 18_angiosperms.nwk and 25_plants.nwk], [input/gene.idmap] and [input/gene.length] files: species trees, a map of gene ID to species ID and a map of gene ID to protein length, as input files necessary for running PhyloMCL (please note that the gene.idmap and gene.length is for two datasets of 18 and 25 plants).

Source: README, updated 2020-01-05